Q&A: What the heck is RAG?

Published on
May 7th, 2024
Share On
This article is a part of Ask Kat, a Q&A series in our AI newsletter, Deeper Learning. You ask questions about AI, then we get the answers and explain them in simple, non-jargony terms.
In today’s edition, we’re touching on RAG, a buzzword you either have seen in AI circles or will notice after you read this.
Last week, we explained why AI models hallucinate, and this week we’re covering a related question submitted by our own CEO: “What’s the deal with RAG?”
You might have seen the term floating around the AI industry lately. It stands for Retrieval Augmented Generation and it’s a process that AI engineers can use to optimize the output of large language models. As you now know, hallucinations ultimately result from shortfalls in the training of AI — i.e. despite all the data that models are trained on, it’s hard to pack everything in. So what if you didn’t need to pack everything in? What if, instead, a model could retrieve information to enhance how accurate its responses are?
With RAG, AI engineers can introduce external data into the process in various formats (think records in databases, document files in repositories, or APIs). We’ll skip over the nitty-gritty of how the external data is delivered for now, but the main point is that it’s converted into a library that the models can “understand” (in short, they have numerical representations that help them determine what’s relevant). So when a user makes a search, the model is no longer just stuck with whatever input the user provided. The system can also reference that library of information and integrate that info with a person’s initial query so that the LLM can deliver contextually appropriate responses. 
For example, let’s say you’re asking an AI bot “How much PTO do I have left?” With RAG, the system would do some calculations on documents it has access to, retrieve your company’s policy docs and any requests you’ve made for time off this year. Then it would augment your original query and deliver the query to the LLM for an answer.
As a former chef, I like to think of RAG as a cook who is making a recipe but goes to check his pantry and doesn’t have all the right ingredients. Instead of just throwing in oil for butter, he checks his phone and determines that vegetable oil plus ½ teaspoon of salt is a better substitute for butter.
Does RAG solve all the problems? No, probably not, as of yet. Many makers report having great results with RAG and the technique may be behind the tools you’re using now, but others have experienced limitations. In many cases, limitations are still a result of context, or lack thereof. Remember: Humans have and create a TON of context. Imagine a complex legal doc that references bits of information throughout pages and pages of a document, each of its parts relating to others but not necessarily in order. That would be hard enough for a human to parse and understand, let alone a model. Check out this article from Pyq AI CEO Aman Raghuvanshi called “LLMs and the Harry Potter Problem” for deeper reading on this.
The rise of RAG. Unsurprisingly, tech startups and companies have been weaving RAG tools into their products or creating out-of-the-box RAG solutions as more AI engineers want to use it. For example, check out the launches SciPhi, Linq, Verta Super RAG.
Alternatives to RAG. Ya, you just learned about RAG, but now people are talking about a “RAG Killer?” Sigh.
The tldr here is that new LLMs, like Meta’s Llama 3, have "long context windows" and these are meant to help them “recall” more stuff. A context window is how many tokens (words, bits of words) an LLM can consider when generating a response, so long context windows should equate to better answers. Do you want to learn more about context windows in the future newsletter? Or the landscape of RAG products? Let us know in the comments!
This article first appeared in our AI newsletter, Deeper Learning. Subscribe here, and let us know what questions you have about AI in the comments!
Comments (4)
Anirudh Madhavan
This was a great read! I'm curious about the key takeaways from using RAG in real-world applications. What are the most common challenges engineers face with RAG, and how do they overcome them? Also, how do new LLMs with long context windows compare in practical performance?
John from Yotta Buzz
@anirudh_madhavan One challenge relates to the writing culture of a company and the variations between source documents and prompt-response pairs. For example, let's say you are the CEO of a company and you publish the CEO's priorities as a document. If someone asks "What are Anirudh's priorities?" RAG may miss if it doesn't also have access to a document that lists you as the CEO. In contrast, the question "What are the CEO's priorities?" should perform well via RAG.
barlow jenkins
The article mentions RAG, but doesn't go into detail about how it retrieves information or how it verifies its accuracy. You will enjoy playing house of hazards if you are playing it with friends.
Neon Dion
CONSULT A LICENSE PATECH RECOVERY HACKER FOR CRYPTO RECOVERY I would like to express my gratitude to PATECH RECOVERY HACKER for helping me through a difficult period. I was duped into making an online investment in which I was promised a 25% weekly profit, but it turned out to be a scam. I was frustrated and unhappy , before coming across of an article about patech recovery about how they have helped other in recovery lost bitocin/funds.  But thanks to patech recovery coming to my aid and helped me recovery my lost bitcoin. If you are victim of such contact patech recovery on... patechrecovery@ proton .me +19137300531 patechrecovery333