From Archives to Innovations: OpenText's Approach to Large Language Models

From Archives to Innovations: OpenText's Approach to Large Language Models

October 17th, 2023 by Wes Worsfold

When it comes to business, nothing matters more than trust. It’s the foundation of every working relationship, whether between employers and employees or businesses working together to bring new solutions to their customers. Bringing artificial intelligence tools to markets—and customers—is not any different. But as we’ve seen across the industry, the rapid proliferation of AI-powered tools is giving people reasons to pause and reflect on how we can build AI tools with good at their core.

On October 5, our team attended the Communitech Breakfast Series event featuring a fireside chat with Chris Albinson and Tom Jenkins, Board Chair and OpenText's former CEO. Jenkin’s conversation focused on AI, which is fitting since the company’s OpenText World conference had the theme — “Welcome to the future of AI.”

OpenText was founded in 1991 to commercialize a project to create the first online version of the Oxford English Dictionary. The company is credited with creating the first internet search engine, and today, its products are used by 98 of the Fortune 100 companies. Through his time as CEO and then Chair of the Board at OpenText, Jenkins has seen the software company grow to over $8 billion in annual revenue with 24,000 employees worldwide. 

While the company started with search, it has become one of the dominant players in content management software. Its products are used in engineering, legal, finance, and almost every other industry that relies on petabytes of data. Jenkins said the company's most recent acquisition of a cybersecurity company is another example of being future-focused on customer needs.

“We're going to a world of computing at the edge, and when you’re computing at the edge with content, you better secure that content,” he said.

Building for the long game with large language models

Securing content is just part of what Jenkins sees OpenText delivering in the future. He said the company had been focused on building reliable large language models (LLMs) long before ChatGPT and generative AI became household terms. 

Every LLM is trained on a dataset. Those datasets' size, validity, and source are significant issues for AI companies looking to build trust in their generative AI tools. 

According to Jenkins, the company has acquired data archives from large companies across multiple industries to train its predictive engines and LLMs. These data archives are the key to building the next generation of OpenText LLMs and other products. 

“15 years ago, I sat in a room in Silicon Valley and the CEO—you’ll know who as I talk about it—asked how I had the archive of a company they just bought. He looked across at me, and he was the first person to really get this. He said, ‘Oh, I get it. You're buying all of them,’ and then he knew what we were talking about was all the archives of the world,” said Jenkins.

How OpenText is using archives to train LLMs

Jenkins said to think of it as “garbage in, garbage out” and a much larger (and more critical) scale than any tool created before. He added that the more quality data they had, the better LLMs they knew they could make.

“Maybe it's only 1%. But if you do the math, a 1%, smarter AI matters. We spent more than a decade buying all the archives of the world and are now in this wonderful position,” he said.

Jenkins pointed to a recent keynote by Microsoft CEO Satya Nadella at Microsoft Inspire 2023 on the future impacts of LLMs and natural language processing (NLP). In the keynote, Nadella said that LLMs will drive innovation on the front end and the back end. For the front end, Jenkins said LLMs will increase the functionality and quality of NLP technology across almost every product.

“The front end is what we would always call natural language query in search. At the beginning of the internet, we had natural language queries offered on the front page of Netscape and Yahoo. No one would use it because our natural language queries were primitive. We didn't have enormous CPUs and enormous cache memories. It was just a different era,” said Jenkins. “Now natural language processors (NLPs) are here. But be careful because NLPs will become a trap for the next few years. It will be the wild west. You can make a lot of money with a lipstick-on-a-pig kind of thing. You can put a nice interface on an existing thing. But you will lose all your money a few years from now because there will only be one go-to-market supplier for each sector. The best hospital LLM. The best financial services LLM.” 

The second area—the back end—is where Jenkins said Nadella hit on the true power of LLMs and NPL technology. Nadella noted that LLMs are powering a new generation of what he called reasoning engines. Think of any project today; you’ll probably think of it as a blank slate, whether a new Google Doc for a blog post or an Xcode project for a new iOS application. The reasoning engines powered by LLMs will allow anyone to start a new project with an entirely thought-out draft created by a generative AI tool.

“Nadella talks about the reasoning engine, and he unpacks the impact that LLMs will have, and the essential thing in computing going forward is information. That, of course, is what OpenText does, so we’re very happy,” he said.

Hearing from diverse perspectives on the future of generative AI is critical to helping our clients as they work to improve customer experiences and internal processes with AI. Have AI questions, we're here to help.

June 5th, 2018 by Jack Mitchell
Our 5 True North Highlights
April 15th, 2021 by Rachel Hickey
Tips from JobJunxion on how to be more successful with your job search
September 24th, 2019 by Alex Kinsella
The end of “you’re not using it right. . . ” and other lessons in UX and design from Fluxible 2019