Advanced RAG Implementation using Hybrid Search
Retrieval-augmented generation (RAG) has transformed the way large language models (LLMs) generate responses by integrating external data. Hybrid RAG, an advanced approach, combines vector similarity search with traditional methods like BM25 and keyword search, enabling more robust and flexible information retrieval.
Hybrid RAG is an advanced retrieval technique that merges vector similarity search and traditional search methods (e.g., keyword search or BM25). This combination allows for more accurate and context-aware retrieval. Many vector stores (e.g., ElasticSearch, Neo4J, AzureSearch) support hybrid search by combining vector similarity with other search techniques.
Key Components of Hybrid RAG:
- .Vector Similarity Search: This method uses embeddings to understand the meaning of queries and documents, focusing on context rather than just matching keywords.
- Traditional Search Methods: This relies on algorithms like BM25 to match keywords, finding documents with exact or partial matches to the query.
By combining these methods, Hybrid RAG can retrieve highly relevant information, even from large, or complex datasets. This dual approach ensures both semantic understanding and precise keyword matching, making it ideal for a wide range of applications.
Benefits of Hybrid RAG
Hybrid RAG provides significant advantages over traditional RAG systems by combining vector similarity search with keyword-based retrieval methods. This approach enables comprehensive retrieval by leveraging both semantic understanding and precise keyword matching, ensuring more accurate results. It enhances contextual understanding by integrating semantic relationships and structured data, allowing for more nuanced and context-aware responses.
Hybrid RAG is also flexible, making it suitable for handling complex data across various applications. Its robustness ensures consistent performance by effectively handling diverse query styles through multiple retrieval methods. Furthermore, Hybrid RAG improves accuracy by reducing the risk of missing relevant information and scales efficiently to manage large datasets without compromising on quality.
Implementing Hybrid RAG: A Step-by-Step Guide
To implement Hybrid RAG, we will use tools like LangChain, ChromaDB, and Athina AI. For the complete code and step-by-step guidance, check out this Google Colab Notebook.
Also, if you are interested in learning advanced RAG techniques, check out the GitHub repository we created to support developers and researchers.
1. Initial Setup
Start by installing the necessary libraries and setting up your environment variables to handle API keys securely.
!pip install --quiet athina chromadb rank_bm25
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ['ATHINA_API_KEY'] = userdata.get('ATHINA_API_KEY')
2. Data Indexing
Load data, split it into smaller chunks, and create a vector index for efficient retrieval.
from langchain.document_loaders import CSVLoader
loader = CSVLoader("./context.csv")
documents = loader.load()
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
documents = text_splitter.split_documents(documents)
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
from langchain.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents, embeddings)
3. Setting Up Retrievers
Set up different retrievers to handle both semantic and keyword-based queries.
Keyword Retriever:
from langchain.retrievers import BM25Retriever
keyword_retriever = BM25Retriever.from_documents(documents)
Standard Retriever:
retriever = vectorstore.as_retriever()
Ensemble Retriever
Combine vector and keyword retrievers for better results.
from langchain.retrievers import EnsembleRetriever
ensemble_retriever = EnsembleRetriever(retrievers=[retriever, keyword_retriever], weights=[0.5, 0.5])
4. Building the RAG Pipeline
Build a RAG pipeline that integrates retrieval and LLM for generating answers.
from langchain_openai import ChatOpenAI
llm = ChatOpenAI()
from langchain.prompts import ChatPromptTemplate
from langchain.schema.runnable import RunnablePassthrough
from langchain.schema.output_parser import StrOutputParser
template = """
You are a helpful assistant that answers questions based on the following context.
If you don't find the answer in the context, just say that you don't know.
Context: {context}
Question: {input}
Answer:
"""
prompt = ChatPromptTemplate.from_template(template)
rag_chain = (
{"context": ensemble_retriever, "input": RunnablePassthrough()}
| prompt
| llm
| StrOutputParser()
)
# Example Response
response = rag_chain.invoke('what bacteria grow on macconkey agar')
print(response)
Output:
Gram-negative and enteric bacteria grow on MacConkey agar.
5. Evaluation with Athina
Once your RAG pipeline is set up, you can evaluate its performance using Athina AI. This step is optional but helpful for testing and validating your pipeline. Athina AI provides automated tools to measure accuracy and ensure the pipeline meets your requirements.
First, prepare your data by generating queries, capturing pipeline responses, and organise the context for each query:
import pandas as pd
from datasets import Dataset
question = ["what bacteria grow on macconkey agar", "who wrote a rose is a rose is a rose"]
response = []
contexts = []
for query in question:
response.append(rag_chain.invoke(query))
contexts.append([docs.page_content for docs in ensemble_retriever.get_relevant_documents(query)])
data = {
"query": question,
"response": response,
"context": contexts,
}
dataset = Dataset.from_dict(data)
df = pd.DataFrame(dataset)
df_dict = df.to_dict(orient='records')
for record in df_dict:
if not isinstance(record.get('context'), list):
if record.get('context') is None:
record['context'] = []
else:
record['context'] = [record['context']]
Now, setup API keys for both OpenAI and Athina
from athina.keys import AthinaApiKey, OpenAiApiKey
OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))
Then, use the Loader
class from Athina to load your dataset in dictionary format.
from athina.loaders import Loader
dataset = Loader().load_dict(df_dict)
Here we will use the Does Response Answer Query evaluation. It checks if the response answers the user's query. To learn more about this, please refer to our documentation.
from athina.evals import DoesResponseAnswerQuery
evaluation_results = DoesResponseAnswerQuery(model="gpt-4o").run_batch(data=dataset).to_df()
print(evaluation_results)
The results will be converted into a data frame, and you can click on the generated link to open the Athina IDE, where you can explore detailed evaluation results and refine your pipeline further.
Conclusion
Hybrid RAG offers a better approach to information retrieval by blending the semantic strengths of vector similarity search with the precision of traditional keyword methods. This combination ensures more accurate and comprehensive retrieval, making it an invaluable tool for applications requiring nuanced and reliable information access.
By following this guide and best practices, you can easily implement Hybrid RAG into your systems, improving the performance and capabilities of your language models.
For more in-depth tutorials and resources, explore our AI Development Blogs and Research Papers.