Retrieval Augmented Generation

Though LLMs are trained on large datasets, companies want to use them with proprietary data. RAG was created to solve this problem.

What is RAG?

In traditional AI approaches, a model is constantly trained and fine-tuned to stay up to date. LLMs are different, because the training process is generalized to many different texts (hence “Large” Language Models). This generalization means constantly re-training the model is cumbersome and expensive.

To mitigate this, Retrieval Augmented Generation (RAG) was created. RAG is the process of using new data to guide the LLMs. Rather than changing model parameters, your data is fed into the prompt as context. Explained in diagram form:

Notice how the LLM is not “trained” on your documents, rather given the “context” of it in the prompt. The model uses the document as a reference to answer the question. The process is very similar to a human. The human will have a question, look at the source, find the correct section, and then get the answer. Simple enough?

Vector Databases to pick documents

In the previous example, how does the LLM know which document to pick? Vector Databases are the mechanism of determining the right document based on the user question. Unlike traditional database’s rigid statements (like SQL queries), Vector Databases use semantic similarity. “Semantic” means meaning. Vector Databases choose documents based on how similar they are in terms of meaning. A basic example is:

Here, “Lion”, “Dog”, and “Cat” are documents in the database. Notice how Lion is farther apart from Dog and Cat. This is because Lion is rarely used in the same context as Dog and Cat, so the distance is greater. When a new question “Pet” is graphed, it is close to “Dog” and “Cat” because all 3 are similar in terms of meaning. Indeed, this means the closer two points are in vector space, the more similar they are in terms of meaning.

Returning to the original example:

The original question’s closest document is the Employee Handbook. So during the model inference, this document will be fed in along with the original question. In practice, the closest 3-4 documents are used, not a single one. This is because the similarity search is often close but not perfect.

You may be wondering why you have never heard of these databases. Vector Databases came into the light because LLMs needed to search documents based on text and meaning. The truth is these databases have been around since the early 2000s, and already used in almost all recommendation algorithms (like Spotify or Google).

RAG keeps your database fluid and scalable

The benefit of RAG is you can add, change, or delete documents whenever you wish. If a clause changes in your Employee Handbook, you can simply update the document and re-insert it into the database. This is instead of having to re-train the model every time your data changes.

RAG is scalable, meaning it works great if there’s a million documents or just a few. Performance does not slow down even when more documents are added. It’s also very cheap to maintain because the search algorithm is lightweight and the bulk of the cost is for storage.

Conclusion

In most business use-cases, you should be using RAG for LLM pipelines over training the LLM from scratch. Training an LLM from scratch requires data preparation, access to expensive GPUs, data scientists with PhDs…it’s going to be expensive and time consuming. Unless you have a reason too, keep it simple and use RAG.