LLMs Demystified

Large Language Models, or LLMs for short, have taken the world by storm. Drafting business reports, translating languages, writing code…all in one model. It’s easy to get caught up in the storm and assume LLMs are a magical black box that can do anything. This guide uncovers how LLMs work from the inside.

It’s a fancy auto-complete machine…

No matter which way you put it, LLMs are simply an auto-complete machine. That’s it. For example, complete the sentence:

The sky is...

Most people would say “blue.” This has been our childhood programming. Something we have used, learned, and heard again and again. LLMs are the same way. To get slightly technical, the LLM will ask itself, given all the previous words (“The sky is”), what word is most likely to come next? There would be a distribution of probabilities, and the highest one is chosen.

The model outputs “blue” because based on the dataset it has seen, blue is the most frequent. Much like the example above, LLMs are trained on vast databases about everything. Websites, books, research articles, and any resource a company can get their hands on. Therefore, if I ask the question:

Who was the 7th person to visit the moon?

You would struggle, because you weren’t “trained” on this knowledge. Though, the LLM responds with excitement “David Randolph Scott” (One word is generated at a time). Under the hood, the LLM asks itself, given the previous words, what is MOST LIKELY to be next. “David” has the highest probability. Then the auto-correct would continue. Given “Who was the 7th person to visit the moon? David”, what is the next most likely word? It would pick “Randolph”.

Every single piece of text that is generated comes down to a distribution like the one above. Notice it’s a “distribution.” This means the right answer is never guaranteed, but the word with the highest probability is. The underlying implication of this is that production use-case for LLMs should be carefully considered (more details in Hallucinations).

The power of LLM comes from that data it is trained on

An auto-complete machine seems rather simple, and limited. In fact, Language Models have been around for a while, gaining prominence in the past 5-6 years after Google released the paper Attention is All You Need. Yet looking at the performance of GPT-4, the applications are revolutionary.

The power lies in how much data the model has been trained on. OpenAI’s GPT 4 model was trained on +100 Terabytes of data! When you have seen billions of examples of math problems, research articles, translations, and everything in between, auto-completing something similar isn’t that complex. It’s a brute force approach that works.

Conclusion: LLMs boil down to statistic

That’s about as technical as we get in this series. It’s important to have a high-level foundation before we continue into the nitty gritty. Remember, LLMs are extensively trained auto-complete models, not magic.