Configuring the .env File
RAG Me Up implements all the components described in the High-Level Design and lets you configure them, possibly even turning some components completely off.
This is all done through the .env
file (in the /server
folder), for which you can find a template version in the repository named .env.template
. Just rename this file to .env
and configure your
RAG Me Up by following this page's documentation.
The following components can be configured through the .env
file.
Logging
logging_level
can be set through this variable. UseDEBUG
for local development and testing and something likeWARN
orERROR
for production settings.
Document embedding
Embeddings are done locally in RAG Me Up, meaning we don't use OpenAI, Anthropic or other API providers for embeddings but instead directly use Huggingface models to do the embedding. They can either be run on GPU (fast, if you have one) or on CPU (slow, but always possible).
embedding_model
sets the Huggingface model to use. Use leaderboards like MTEB to decide what model you want to use. Try to find a trade-off between size/embedding dimension (hence, speed) and accuracy.embedding_cpu
can be used to force CPU usage, by default RAG Me Up will use a GPU.
Data loading
data_directory
should be the full or relative path to where your data is stored. RAG Me Up will use this when it is run for the first time to read in all the supported files once and load them into the database. Once there is data present in the database, it will not load the data from this folder yet again.file_types
is a comma-separated list of file types (extensions) that should be loaded. RAG Me Up currently supportspdf,json,docx,pptx,xslx,csv,txt
. If you leave one of these out, files with that extension, while maybe present in your data directory, will not be loaded.json_schema
if JSON files are to be processed, you can specify a custom schema to load specific parts using jq.csv_seperator
if CSV files are to be processed, this should be the separator.
Chunking
splitter
should be the name of the splitter to use. Allowed are:RecursiveCharacterTextSplitter
,SemanticChunker
orParagraphChunker
(a RecursiveCharacterTextSplitter that always respects paragraph boundaries).recursive_splitter_chunk_size
sets the chunk size for the RecursiveCharacterTextSplitter.recursive_splitter_chunk_overlap
sets the overlap size for the RecursiveCharacterTextSplitter.semantic_chunker_breakpoint_threshold_type
sets the threshold type for the SemanticChunker. Allowed are:percentile
,standard_deviation
,interquartile
,gradient
.semantic_chunker_breakpoint_threshold_amount
set the treshold amount for the SemanticChunker.semantic_chunker_number_of_chunks
set the number of chunks for the SemanticChunker.paragraph_chunker_max_chunk_size
set the chunk size for the ParagraphChunker.paragraph_chunker_paragraph_separator
set the separator size for the ParagraphChunker. This will be used to determine what a paragraph is by splitting on it as regex.
Database setup
postgres_uri
should specify the URI for the Postgres instance to use. Check thepostgres
subfolder for a Docker that will run an image that can function as a hybrid retrieval store.vector_store_k
is the number of document chunks to fetch (during normal retrieval) from the Postgres database.
Reranking
rerank
is a boolean that you can set to turn reranking on or off.rerank_k
the number of documents to keep after reranking. Usually you setvector_store_k
relatively high andrerank_k
to be your final desired amount of chunks to keep.rerank_model
the (flashrank) rerank model to use, find alternatives on Huggingface but be sure they are flashrank-compatible.
HyDE
use_hyde
boolean to turn HyDE on or off.hyde_query
the prompt given to the LLM to let it generate a hypothetical document. Must have the following placeholders:question
the user question.
Conversation summarization
use_summarization
is a boolean turning automatic summarization (of the conversation history) on or off.summarization_threshold
is the number of tokens (not characters) the history should exceed to start summarization. Be sure to keep a buffer between this threshold and your model's context window because there will be some overhead from the summarization prompt.summarization_query
this is the query that will be sent to the LLM to run the actual summarization. Must have the following placeholders:history
containing the actual conversation history, best put at the end of the prompt.
summarization_encoder
the tiktoken model to use to count the tokens for thesummarization_threshold
check.
RAG configuration
temperature
should be the model temperature. 0 for no variation and higher for more variation.rag_instruction
the system prompt/instruction to use for normal RAG query answering. It is wise to include some background on the RAG system's purpose and always try to force the system to mention sources used. Must have the following placeholders:context
the documents retrieved from the Postgres database.
rag_question_initial
decoration around the user's question that will be asked to the LLM. RAG Me Up allows to differentiate initial and follow-up questions through different prompts. Must have the following placeholders:question
the original user question.
rag_question_followup
same as above but for a followup question. Must have the following placeholders:question
the user's follow-up question.
rag_fetch_new_question
the prompt sent to the LLM to ask to check for whether or not we should fetch new documents, given that we have a follow-up already. This prompt must force the LLM to answer with yes (should fetch) or no (no fetch required) only. Must have the following placeholders:question
the user's follow-up question.
Rewrite loop
use_rewrite_loop
a boolean indicating whether or not we should use the rewrite-loop (only once) to check if the retrieved documents can be used to answer the user's question.rewrite_query_instruction
the system prompt sent to the LLM to decide if the documents can answer the question. Must contain the retrieved documents. This prompt must force the LLM to answer with yes (documents can answer the question) or no (we should rewrite the question), followed by a motivation. This motivation will be used in the actual rewriting. Must have the following placeholders:context
the documents retrieved from the Postgres database.
rewrite_query_question
the message sent to the LLM containing the user's question that should be answered with the documents. Must have the following placeholders:question
the user question.
rewrite_query_prompt
this prompt is used to instruct the LLM to perform the actual rewrite. It is adviced to let the LLM only answer with the rephrasing and not let it add decorations or explanations. Must have the following placeholders:question
the user question to rewrite.motivation
the motivation output from the earlier query asking the LLM whether the documents can be used to answer the question or not.
Re2
use_re2
boolean indicating whether or not to use re-reading instruction.re2_prompt
the prompt that will be injected in between the re-iterating of the question. This will result in the following format:[Original question]\n{re2_prompt}\n[Original question]
Provenance
provenance_method
the provenance attribution metric to use. Allowed values are:rerank
,similarity
,llm
orNone
(to turn provenance attribution off).provenance_similarity_llm
is the model to use when applyingsimilarity
provenance attribution to compute the similarities of the documents to the answer (and question).provenance_include_query
by default provenance is attributed to the answer only. Set this flag to True to also attribute to the question.provenance_llm_prompt
is the prompt used to ask the LLM for provenance when the provenance_method is set to LLM. You are free to define any ranking score or mechanism but do make this really clear in the prompt. Must have the following placeholders:query
the user question.answer
the answer to the question as generated by the LLM.context
the document chunk that we are attributing provenance for.
Model selection
You can choose between different LLM providers, including running your own locally through Ollama. Make sure that you set all environment variables that are required for your specific provider (eg. the OPENAI_API_KEY
for OpenAI)
use_openai
set to True to use OpenAI.openai_model_name
the model to use when selecting OpenAI.use_azure
set to True to use Azure OpenAI.use_gemini
set to True to use Gemini.gemini_model_name
the Gemini model to use when selecting Gemini.use_anthropic
set to True to use Anthropic.anthropic_model_name
the Antrhopic model to use when selecting Anthropic.use_ollama
set to True to use Ollama (local model).ollama_model_name
the Ollama model to use when selecting Ollama. Look for models here