The Retrieval Augmented Generation (RAG) technique enhances the capabilities of large language models (LLMs) by grounding their responses in external knowledge sources.
This approach improves the accuracy and relevance of LLM outputs, especially when dealing with domain-specific topics.
Traditional RAG systems retrieve a single document to inform responses. While effective for straightforward queries, this approach may fall short for complex or multifaceted queries.
RAG Fusion was introduced to address this limitation. It extends RAG's capabilities by retrieving and fusing information from multiple documents, leading to more comprehensive and informative responses.
In this guide, we’ll explore RAG Fusion, its benefits, different fusion approaches, and how to implement RAG Fusion with a reciprocal ranking technique that blends search results to improve the overall output.
Understanding RAG Fusion
RAG Fusion starts by generating diverse queries from the initial user input. This step helps a more thorough search, gathering information from multiple knowledge bases or perspectives.
The system then fuses the retrieved information, synthesizing a response using relevant details. This multi-faceted retrieval approach not only enriches the output but also enhances the reliability of the response.
There are several approaches to fusing information from multiple documents.
- Reciprocal Rank Fusion (RRF): This method combines the ranked lists of documents from different queries by calculating a score for each document based on its position in each list. This highlights documents that consistently rank high across multiple queries.
- Weighted Averaging: In this approach, each document is assigned a weight based on its relevance score. The documents are then combined by a weighted average of their embeddings or representations.
- Ensemble Methods: This method uses multiple retrieval methods or models and combines their results. For example, you could use different types of vector databases or different query expansion techniques.
- Simple Concatenation: In this approach, retrieved documents are concatenated and fed to the LLM. While simple to implement, this method can lead to long input sequences, potentially exceeding the LLM's context window.
Why use RAG Fusion?
RAG Fusion has several advantages over traditional single-document RAG:
- More Comprehensive Answers: Combining information from multiple sources, RAG Fusion can provide more complete and nuanced answers, especially for complex questions.
- Reduced Bias: Relying on a single document can introduce bias. RAG Fusion answers this by considering multiple perspectives.
- Improved Accuracy: Access to a wider range of information can lead to more accurate and factually correct responses.
Implementing RAG Fusion with Reciprocal Rank Fusion
Now, let's walk through the technical steps to implement RAG Fusion, from environment setup to generating the final output using reciprocal rank fusion.
The code used in this guide has been re-implemented from this GitHub repo.
Step 1: Set up the environment
The first step is to set up your environment and install the necessary libraries. We’ll also need an OpenAI API key to interact with the language model.
Install and import the necessary libraries:
```
! pip install openai
```
```
import os
import openai
import random
from getpass import getpass
from openai import OpenAI
```
Set Up the OpenAI API Key:
- Obtain an API key from the OpenAI website.
- Set the OPENAI_API_KEY environment variable:
```
import os
# Initialize OpenAI API
os.environ["OPENAI_API_KEY"] = getpass('Enter your OpenAI API Key: ')
Alternative: Use environment variable
openai.api_key = os.getenv("OPENAI_API_KEY")
if openai.api_key is None:
raise Exception("No OpenAI API key found. Please set it as an environment variable or in main.py")
```
Step 2: Generate Diverse Queries
Generating diverse queries is important for obtaining comprehensive results across different sources. Here, we will use the LLM to produce variations in the initial query.
Function to Generate Multiple Queries using LLM
```
from openai import OpenAI
client = OpenAI()
# Function to generate queries using OpenAI's ChatGPT
def generate_queries_chatgpt(original_query):
response = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful assistant that generates multiple search queries based on a single input query."},
{"role": "user", "content": f"Generate multiple search queries related to: {original_query}"},
{"role": "user", "content": "OUTPUT (4 queries):"}
]
)
# Access the content using the 'message' attribute of the first choice
generated_queries = response.choices[0].message.content.strip().split("\n")
return generated_queries
```
Function generate_queries_chatgpt will take the input query and ask the LLM (GPT-3.5-turbo in this case) to create multiple related queries.
Let's test and give input query to generate_queries_chatgpt function.
```
original_query = "impact of climate change"
generated_queries = generate_queries_chatgpt(original_query)
print(generated_queries)
```
Output
```
['1. Specific impacts of climate change on agriculture',
'2. Economic consequences of climate change',
'3. Health effects of climate change',
'4. Strategies to mitigate the impact of climate change']
```
Step 3: Perform Vector Search
Once the multiple queries are generated, the next step is to perform a vector search for each query. This will retrieve relevant documents for every query variation.
Function to Perform a Vector Search for Queries:
```
# Predefined set of documents (usually these would be from your search database like Faiss, Pinecone, or Milvus)
all_documents = {
"doc1": "Climate change and economic impact.",
"doc2": "Public health concerns due to climate change.",
"doc3": "Climate change: A social perspective.",
"doc4": "Technological solutions to climate change.",
"doc5": "Policy changes needed to combat climate change.",
"doc6": "Climate change and its impact on biodiversity.",
"doc7": "Climate change: The science and models.",
"doc8": "Global warming: A subset of climate change.",
"doc9": "How climate change affects daily weather.",
"doc10": "The history of climate change activism."
}
# Mock function to simulate vector search, returning random scores
def vector_search(query, all_documents):
available_docs = list(all_documents.keys())
random.shuffle(available_docs)
selected_docs = available_docs[:random.randint(2, 5)]
scores = {doc: round(random.uniform(0.7, 0.9), 2) for doc in selected_docs}
return {doc: score for doc, score in sorted(scores.items(), key=lambda x: x[1], reverse=True)}
```
The function vector_search simulates a vector search by randomly selecting a few documents and assigning them scores, producing rankings for each query.
Let's test the function and perform a vector search for each generated query.
```
all_results = {}
for query in generated_queries:
search_results = vector_search(query, all_documents)
all_results[query] = search_results
print(all_results)
```
Output
```
{'1. Specific impacts of climate change on agriculture': {'doc6': 0.9, 'doc3': 0.87, 'doc5': 0.8, 'doc9': 0.76, 'doc10': 0.76},
'2. Economic consequences of climate change': {'doc4': 0.9, 'doc2': 0.83, 'doc6': 0.83, 'doc5': 0.76},
'3. Health effects of climate change': {'doc7': 0.87, 'doc1': 0.76, 'doc2': 0.75},
'4. Strategies to mitigate the impact of climate change': {'doc10': 0.87, 'doc2': 0.85, 'doc9': 0.75, 'doc3': 0.73}}
```
Step 4: Apply Reciprocal Rank Fusion
Let's create a function reciprocal_rank_fusion to aggregate results from different queries, prioritizing documents that consistently rank high across multiple lists.
```
# Reciprocal Rank Fusion algorithm
def reciprocal_rank_fusion(search_results_dict, k=60):
fused_scores = {}
print("Initial individual search result ranks:")
for query, doc_scores in search_results_dict.items():
print(f"For query '{query}': {doc_scores}")
for query, doc_scores in search_results_dict.items():
print(query,doc_scores)
for rank, (doc, score) in enumerate(sorted(doc_scores.items(), key=lambda x: x[1], reverse=True)):
print(rank,doc,score)
if doc not in fused_scores:
fused_scores[doc] = 0
previous_score = fused_scores[doc]
#print(fused_scores[doc])
fused_scores[doc] += 1 / (rank + k)
print(f"Updating score for {doc} from {previous_score} to {fused_scores[doc]} based on rank {rank} in query '{query}'")
reranked_results = {doc: score for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)}
print("Final reranked results:", reranked_results)
return reranked_results
```
Step 5: Generate the Output
Let's create a function that generates the final output using the reranked documents and the original queries.
```
# Dummy function to simulate generative output
def generate_output(reranked_results, queries):
return f"Final output based on {queries} and reranked documents: {list(reranked_results.keys())}"
reranked_results = reciprocal_rank_fusion(all_results)
final_output = generate_output(reranked_results, generated_queries)
print(final_output)
```
Output
Full Code Execution
Here’s the complete code tying all steps together:
```
if __name__ == "__main__":
# Step 1: Initial query setup
original_query = "impact of climate change"
# Step 2: Generate diverse queries
generated_queries = generate_queries_chatgpt(original_query)
# Step 3: Perform vector search for each query
all_results = {}
for query in generated_queries:
search_results = vector_search(query, all_documents)
all_results[query] = search_results
# Step 4: Apply Reciprocal Rank Fusion
reranked_results = reciprocal_rank_fusion(all_results)
# Step 5: Generate the final output
final_output = generate_output(reranked_results, generated_queries)
print(final_output)
```
Challenges in RAG Fusion
While RAG Fusion offers many advantages, there are potential challenges to consider:
- Inconsistency: Different sources may present conflicting information. RAG Fusion must handle such discrepancies to avoid generating inaccurate or contradictory outputs.
- Computational Cost: Processing multiple documents and fusing their information can be computationally expensive, especially with large datasets or complex fusion techniques.
- Context Window Constraints: The fused documents can lead to long responses, and exceeding the model’s context window may result in truncated or incomplete outputs.
- Redundancy: There's a risk of redundancy when fusing information from multiple sources. This can lead to verbose or repetitive outputs.
Conclusion
RAG Fusion represents a significant advancement in Retrieval Augmented Generation, which helps LLMs use the richness and diversity of multiple knowledge sources.
By combining information from various perspectives, RAG Fusion enhances the comprehensiveness, accuracy, and robustness of LLM outputs.
This guide provides a step-by-step walkthrough for implementing RAG Fusion using the Reciprocal Rank Fusion technique. We explore the benefits and discuss potential challenges to consider.
Using the steps in this guide, you can successfully set up RAG Fusion to make the most of retrieval-augmented LLMs in your applications.
Additional Resources
For more information about how to get started with Athina AI and how to integrate RAG into your application, see the following guides and articles:
- Evaluation Best Practices
- Running evals as real-time guardrails
- Integrating Multiple Data Sources for Better LLM Retrieval
- How to Integrate Retrieval-Augmented Generation (RAG) in Your LLM Applications
- How to Use Prompt Engineering to Control LLM Outputs
- How to Test and Validate LLMs with Real-World Scenarios
- Building an Ideal Tech Stack for LLM Applications