Configuring the .env File

RAG Me Up implements all the components described in the High-Level Design and lets you configure them, possibly even turning some components completely off. This is all done through the .env file (in the /server folder), for which you can find a template version in the repository named .env.template. Just rename this file to .env and configure your RAG Me Up by following this page's documentation.

The following components can be configured through the .env file.

Logging

logging_level can be set through this variable. Use DEBUG for local development and testing and something like WARN or ERROR for production settings.

Document embedding

Embeddings are done locally in RAG Me Up, meaning we don't use OpenAI, Anthropic or other API providers for embeddings but instead directly use Huggingface models to do the embedding. They can either be run on GPU (fast, if you have one) or on CPU (slow, but always possible).

embedding_model sets the Huggingface model to use. Use leaderboards like MTEB to decide what model you want to use. Try to find a trade-off between size/embedding dimension (hence, speed) and accuracy.
embedding_cpu can be used to force CPU usage, by default RAG Me Up will use a GPU.

Data loading

data_directory should be the full or relative path to where your data is stored. RAG Me Up will use this when it is run for the first time to read in all the supported files once and load them into the database. Once there is data present in the database, it will not load the data from this folder yet again.
file_types is a comma-separated list of file types (extensions) that should be loaded. RAG Me Up currently supports pdf,json,docx,pptx,xslx,csv,txt. If you leave one of these out, files with that extension, while maybe present in your data directory, will not be loaded.
json_schema if JSON files are to be processed, you can specify a custom schema to load specific parts using jq.
csv_seperator if CSV files are to be processed, this should be the separator.

Chunking

splitter should be the name of the splitter to use. Allowed are: RecursiveCharacterTextSplitter, SemanticChunker or ParagraphChunker (a RecursiveCharacterTextSplitter that always respects paragraph boundaries).
recursive_splitter_chunk_size sets the chunk size for the RecursiveCharacterTextSplitter.
recursive_splitter_chunk_overlap sets the overlap size for the RecursiveCharacterTextSplitter.
semantic_chunker_breakpoint_threshold_type sets the threshold type for the SemanticChunker. Allowed are: percentile, standard_deviation, interquartile, gradient.
semantic_chunker_breakpoint_threshold_amount set the treshold amount for the SemanticChunker.
semantic_chunker_number_of_chunks set the number of chunks for the SemanticChunker.
paragraph_chunker_max_chunk_size set the chunk size for the ParagraphChunker.
paragraph_chunker_paragraph_separator set the separator size for the ParagraphChunker. This will be used to determine what a paragraph is by splitting on it as regex.

Database setup

postgres_uri should specify the URI for the Postgres instance to use. Check the postgres subfolder for a Docker that will run an image that can function as a hybrid retrieval store.
vector_store_k is the number of document chunks to fetch (during normal retrieval) from the Postgres database.

Reranking

rerank is a boolean that you can set to turn reranking on or off.
rerank_k the number of documents to keep after reranking. Usually you set vector_store_k relatively high and rerank_k to be your final desired amount of chunks to keep.
rerank_model the (flashrank) rerank model to use, find alternatives on Huggingface but be sure they are flashrank-compatible.

HyDE

use_hyde boolean to turn HyDE on or off.
hyde_query the prompt given to the LLM to let it generate a hypothetical document. Must have the following placeholders:
- question the user question.

Conversation summarization

use_summarization is a boolean turning automatic summarization (of the conversation history) on or off.
summarization_threshold is the number of tokens (not characters) the history should exceed to start summarization. Be sure to keep a buffer between this threshold and your model's context window because there will be some overhead from the summarization prompt.
summarization_query this is the query that will be sent to the LLM to run the actual summarization. Must have the following placeholders:
- history containing the actual conversation history, best put at the end of the prompt.
summarization_encoder the tiktoken model to use to count the tokens for the summarization_threshold check.

RAG configuration

temperature should be the model temperature. 0 for no variation and higher for more variation.
rag_instruction the system prompt/instruction to use for normal RAG query answering. It is wise to include some background on the RAG system's purpose and always try to force the system to mention sources used. Must have the following placeholders:
- context the documents retrieved from the Postgres database.
rag_question_initial decoration around the user's question that will be asked to the LLM. RAG Me Up allows to differentiate initial and follow-up questions through different prompts. Must have the following placeholders:
- question the original user question.
rag_question_followup same as above but for a followup question. Must have the following placeholders:
- question the user's follow-up question.
rag_fetch_new_question the prompt sent to the LLM to ask to check for whether or not we should fetch new documents, given that we have a follow-up already. This prompt must force the LLM to answer with yes (should fetch) or no (no fetch required) only. Must have the following placeholders:
- question the user's follow-up question.

Rewrite loop

use_rewrite_loop a boolean indicating whether or not we should use the rewrite-loop (only once) to check if the retrieved documents can be used to answer the user's question.
rewrite_query_instruction the system prompt sent to the LLM to decide if the documents can answer the question. Must contain the retrieved documents. This prompt must force the LLM to answer with yes (documents can answer the question) or no (we should rewrite the question), followed by a motivation. This motivation will be used in the actual rewriting. Must have the following placeholders:
- context the documents retrieved from the Postgres database.
rewrite_query_question the message sent to the LLM containing the user's question that should be answered with the documents. Must have the following placeholders:
- question the user question.
rewrite_query_prompt this prompt is used to instruct the LLM to perform the actual rewrite. It is adviced to let the LLM only answer with the rephrasing and not let it add decorations or explanations. Must have the following placeholders:
- question the user question to rewrite.
- motivation the motivation output from the earlier query asking the LLM whether the documents can be used to answer the question or not.

Re2

use_re2 boolean indicating whether or not to use re-reading instruction.
re2_prompt the prompt that will be injected in between the re-iterating of the question. This will result in the following format: [Original question]\n{re2_prompt}\n[Original question]

Provenance

provenance_method the provenance attribution metric to use. Allowed values are: rerank, similarity, llm or None (to turn provenance attribution off).
provenance_similarity_llm is the model to use when applying similarity provenance attribution to compute the similarities of the documents to the answer (and question).
provenance_include_query by default provenance is attributed to the answer only. Set this flag to True to also attribute to the question.
provenance_llm_prompt is the prompt used to ask the LLM for provenance when the provenance_method is set to LLM. You are free to define any ranking score or mechanism but do make this really clear in the prompt. Must have the following placeholders:
- query the user question.
- answer the answer to the question as generated by the LLM.
- context the document chunk that we are attributing provenance for.

Model selection

You can choose between different LLM providers, including running your own locally through Ollama. Make sure that you set all environment variables that are required for your specific provider (eg. the OPENAI_API_KEY for OpenAI)

use_openai set to True to use OpenAI.
openai_model_name the model to use when selecting OpenAI.
use_azure set to True to use Azure OpenAI.
use_gemini set to True to use Gemini.
gemini_model_name the Gemini model to use when selecting Gemini.
use_anthropic set to True to use Anthropic.
anthropic_model_name the Antrhopic model to use when selecting Anthropic.
use_ollama set to True to use Ollama (local model).
ollama_model_name the Ollama model to use when selecting Ollama. Look for models here

Logging​

Document embedding​

Data loading​

Chunking​

Database setup​

Reranking​

HyDE​

Conversation summarization​

RAG configuration​

Rewrite loop​

Re2​

Provenance​

Model selection​