An introduction to RAG (and AI)

The field of AI obviously stretches across many tasks but the one we will be focusing on right now is that of Natural Language Processing (NLP). In particular, given that we are looking into AI and RAG, we will focus on the use of Large Language Models (LLMs). LLMs typically come in different forms and shapes and serve different purposes. While an over-simplification, for the remainder of the documentation we will assume that LLMs can be one of the following:

A foundation model - This is a model that is trained for many hours across a lot of GPUs to learn a probabilistic model on language. The result of this task is a model that is able to generate a new token¹ given a history of already produced tokens. It does so by looking at the history of produced tokens and deciding what is the most probable next token to produce. This next token is then fed back into the LLM, becoming part of the new history of the next step, to produce the next token. This behavior is called auto-regression. Foundation models are capable of producing text that we as humans can read but they generally lack an interactive nature that is required for RAG. Examples of foundation models are:
- LLaMa 4
- Gemma 3
- GPT 4.5
- And many more like Claude, Grok, Gemini, etc.
An instruct model - While foundation models are trained to just generate the next token given previous ones, for RAG and chatting in general, we need a model that doesn't just complete what we are saying, we need it to respond or follow our instruction. To this end, models are trained that not just predict the next token of any given text corpus, but predict the next token of an answer or reply when given a question. It is a training step on top of foundation models to make them suitable for chatting. For RAG, these are the type of models we are mostly interested in.
Alignment-tuned models (RLHF, DPO, CPO, etc.) - These are instruct models that are tuned towards a specific domain or task. Originally, by sampling a set of candidate answers with each user question allows to decide which of those answers is preferred over the other answers. This way, the instruct model not only learns to answer questions but also knows what answers better suit certain questions. Since alignment-tuned models are just instruct models further aligned, they can equally well be used for RAG.
Reasoning models - When we move beyond just simple instruction following models and use reasoning models. While their inner workings are far more complex, a simple way of looking at them is to see them as models that have a number of internal instruct steps where they first ask (themselves) to come up with a gameplan of answering the user query and then execute that gameplan, evaluating as they go along until they are "confident" enough they can answer the original user question. While reasoning models can be considered the "strongest" type of model to date, they are not generally well-suited for RAG as they degrade user experience because of the lengthy reasoning process.

For the remainder of talking about RAG Me Up, we will be using instruct models, either from commercial providers or from Huggingface models².

A token is a term often used in the LLM world and represents a combination of (potentially system or non-readable) characters. It is not the same as a word. ↩
When using models from Huggingface, make sure you choose an instruct model. ↩

Footnotes​

Footnotes