Skip to content
AI

Intro to large language models: the journey to billions of parameters

What are large language models (LLMs)? Learn the basics of what’s behind innovative AI applications like ChatGPT.

Ever wondered what’s behind innovative AI applications like ChatGPT? Increasingly, large language models (LLMs) are entering the mainstream and becoming responsible for impressive new technology. But what are LLMs, and how did they get so sophisticated? Let’s break it down.

The objective of any language model is to use existing text (input) to generate new text (output). Exactly how the model interprets input text has changed over time as data access and computing power have increased.

Over the past two years, language models grew exponentially in size, graduating from training on hundreds of data points to hundreds of billions of data points. At such sizes, the moniker large language models, or LLMs, captures an important distinction from these models’ predecessors.

The history of language models

Research into language models can be traced back to the late 1980s. At that time, modeling was driven by the statistical properties of phrases, so output text was computed based on how frequently a word or phrase appeared in input text. The result was that the next output phrase should be the most frequent from the training data. But by nature of this process, rare phrases were biased against. Since common words are not necessarily the ones that convey the meaning of a phrase, this approach usually generated incoherent text.

In the early 2000s, with more available computing resources, phrase-based models were replaced by neural networks, which are a method of AI that simulate how the human brain processes data. Neural networks took a fixed window of text as input, meaning there was a limited amount of language the model could learn from to generate new text. While coherence improved, the fixed window requirement in the input text limited further progress.

In 2007, special neural networks that take input text of unlimited size were created, lifting the fixed-window limitation. These networks utilized sequential analysis—they took one word as input at each time-step but carried over information from the preceding words. This training framework generated text consistent with the input text, leading to improved coherence.

Yet there were still shortcomings. For example, a network trained on input text targeting 5th-graders generated 5th-grade-level text and failed when asked to generate text for executives. The sequential approach to training prohibited more sophisticated textual analysis and limited computing resources. As a result, it took a long time to train diverse and larger datasets.

In 2017, the Transformer architecture allowed for parallelized analysis, meaning all the words in a text are analyzed simultaneously, rather than sequentially. This innovation had two major outcomes: it expanded the amount of data/input text the model could train on and it expedited training times. Models could train more data, faster.

From there, things really took off. In 2018, a 110 million parameter language model was released, and in 2019, a 1.5 billion parameter model became available. These language models were powerful, but the quality of text they generated deteriorated after a couple of paragraphs.  

In 2020, GPT-3 was released. The new version included even more data: at 175 billion parameters, GPT-3 certainly qualifies as an LLM. Its size, in addition to the larger dataset on which it was trained, enables it to generate longer, higher quality text. Moreover, it is now possible for one model to generate both 5th-grade-level text and executive-level text.  

LLMs

LLMs today

LLMs are powerful. In fact, they can answer almost any random question. The result? Dozens of text-based applications and browser extensions have been developed within the past two months. How performant are they? That’s an open question, but such development speed is unprecedented. Our experience here at Saifr shows that while LLMs can boost development speed, data is crucial to developing a stable solution, especially for complicated tasks.  

1076526.1.0

Sara Walker

Content Marketing Associate
Sara has a background in numerous word-related fields, including nonprofit communications, literary blogging, community media, English tutoring, and now content marketing. She holds a BA in English from Arizona State University.

Check out our latest blogs

Q1 2024 regulatory roundup: all things AI and enforcement

Q1 2024 regulatory roundup: all things AI and enforcement

Q1 saw continued regulatory focus on artificial intelligence, with recent enforcements indicating AI is a regulatory priority.

Lessons learned in AI risk management from Compliance Week 2024

Lessons learned in AI risk management from Compliance Week 2024

The hot topic at Compliance Week 2024 was AI risk management. To combat AI risks, compliance officers can implement frameworks, policies, a...

The EU AI Act has passed: Now what?

The EU AI Act has passed: Now what?

The EU AI Act is a signal to the world to pay attention to AI regulation. Here are the implications I see for firms worldwide.