athina-originals

Implementing RAG using Hypothetical Document Embeddings (HyDE)

Prasad Mahamulkar

03 Feb 2025 — 4 min read

Retrieval-augmented generation (RAG) is a powerful technique in information retrieval where an external knowledge base is leveraged to improve the quality and accuracy of responses. One of the challenges in traditional RAGl systems is the dependence on explicit query-document similarity, which often requires labeled relevance data.

Hypothetical Document Embeddings (HyDE), tackles this issue by generating idealized documents that capture the essence of a query and then using these generated embeddings to retrieve similar real documents. This approach enables high-quality retrieval in zero-shot settings, where no labeled training data is available.

Let’s start by understanding what HyDE is and how it works.

What is HyDE?

HyDE improves search results by generating hypothetical documents from a query and using them to retrieve relevant real documents. It operates in two stages:

Hypothetical Document Generation: Given a user query, an instruction-following large language model (LLM) generates a synthetic document that hypothetically answers the query.
Embedding and Retrieval: The generated document is then converted into hypothetical document embeddings. These embeddings are then used to find real documents in a vector database that are most similar to the hypothetical document embeddings.

By using hypothetical document embeddings, HyDE RAG improves retrieval accuracy and adapts well to different domains, even without labeled training data.

Why is HyDE Better Than Naive RAG?

Compared to conventional RAG systems, which retrieve documents purely based on query similarity, HyDE offers several advantages:

Better Generalization: Since HyDE generates a hypothetical document, it does not strictly depend on keyword overlap and can generalize better across domains.
Relevance without Supervision: Traditional dense retrieval models require extensive training on relevance-labeled datasets. HyDE eliminates this requirement by leveraging a generative LLM for query expansion.
Improved Retrieval Accuracy: Instead of retrieving based solely on query similarity, HyDE anchors retrieval around a generated document that closely aligns with the query's intent, improving retrieval performance.
Zero-shot Capabilities: Unlike fine-tuned RAG systems that require labeled relevance data, HyDE works effectively in zero-shot scenarios, making it adaptable to new domains.
Mitigating Hallucination: Since retrieval in HyDE is guided by a hypothetical document, it helps reduce irrelevant retrievals, thereby lowering the chances of generating hallucinated responses.

Now that we understand what HyDE is and why it is better than naive RAG, let's see how to implement it.

Implementation of HyDE

For the complete code check out this Colab Notebook. Also, if you are interested in learning advanced RAG techniques, check out the GitHub repository we created.

Step 1: Setup and Installation

Start by installing the required libraries and setting API keys.

! pip install --q -U athina langchain-weaviate

import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ['ATHINA_API_KEY'] = userdata.get('ATHINA_API_KEY')
os.environ['WEAVIATE_API_KEY'] = userdata.get('WEAVIATE_API_KEY')

Step 2: Processing Documents

Then load a CSV file, split it into smaller chunks, and generate embeddings using OpenAI.

from langchain_openai import OpenAIEmbeddings
from langchain.document_loaders import CSVLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Load documents
loader = CSVLoader("./context.csv")
documents = loader.load()

# Split into chunks for better retrieval
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
documents = text_splitter.split_documents(documents)

# Generate embeddings
embeddings = OpenAIEmbeddings()

Step 3: Vector Store and Retriever

After that create a vector database using Weaviate that allows us to search for relevant documents efficiently.

import weaviate
from weaviate.classes.init import Auth

# Connect to Weaviate Cloud
wcd_url = 'your_url'
client = weaviate.connect_to_weaviate_cloud(
    cluster_url=wcd_url,
    auth_credentials=Auth.api_key(os.environ['WEAVIATE_API_KEY']),
    headers={'X-OpenAI-Api-key': os.environ["OPENAI_API_KEY"]}
)

# Store documents in Weaviate
from langchain_weaviate.vectorstores import WeaviateVectorStore
vectorstore = WeaviateVectorStore.from_documents(
    documents, embedding=embeddings, client=client, 
    index_name="your_collection_name", text_key="text"
)

# create retriever
retriever = vectorstore.as_retriever()

Step 3: HyDE Retrieval

Now first build the LLM chain without the retriever (context) to get a hypothetical answer.

# create llm
from langchain_openai import ChatOpenAI
llm = ChatOpenAI()

# chain without the retriever
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI

template ="""
You are a helpful assistant that answers questions.
Question: {input}
Answer:
"""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", template),
        ("human", "{input}"),
    ]
)
qa_no_context = prompt | llm | StrOutputParser()

Step 4: Combining with retriever

Then combine the Hyde chain with the retriever to get the exact documents.

# response with context
retrieval_chain = qa_no_context | retriever
retrieved_docs = retrieval_chain.invoke({"input":question})

Step 5: Final Retrieval

Finally, create a new LLM chain to generate an answer from the above (retrieved docs) from the context+query.

template = """
You are a helpful assistant that answers questions based on the provided context.
Use the provided context to answer the question.
Question: {input}
Context: {context}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

We can now test the system with a query.

# final response
final_rag_chain.invoke({"context":retrieved_docs,"input":question})

Output:

'Interlibrary loan works by allowing patrons of one library to borrow physical materials or receive electronic documents that are held by another library. The borrowing library identifies potential lending libraries with the desired item, and the lending library delivers the item either physically or electronically. The borrowing library then receives the item, delivers it to their patron, and arranges for its return if necessary. In some cases, fees may accompany interlibrary loan services. Libraries negotiate for interlibrary loan eligibility, especially for digital materials like ebooks, through legal, technical, and licensing aspects.'

Step 7: Evaluating with Athina AI

First, prepare your data by generating queries, capturing pipeline responses, and organising the context for each query.

# create dataset
question = ["how does interlibrary loan work"]
response = []
contexts = []

# Inference
for query in question:
  response.append(final_rag_chain.invoke({"context":retrieved_docs,"input":query}))
  contexts.append([docs.page_content for docs in retriever.get_relevant_documents(query)])

# To dict
data = {
    "query": question,
    "response": response,
    "context": contexts,
}

# Convert to dictionary
df_dict = df.to_dict(orient='records')

# Convert context to list
for record in df_dict:
    if not isinstance(record.get('context'), list):
        if record.get('context') is None:
            record['context'] = []
        else:
            record['context'] = [record['context']]

Then evaluate how relevant the retrieved context is. We use Context Relevancy Evaluation to check our results.

# set api keys for Athina evals
from athina.keys import AthinaApiKey, OpenAiApiKey
OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

# load dataset
from athina.loaders import Loader
dataset = Loader().load_dict(df_dict)

# evaluate
from athina.evals import RagasContextRelevancy
RagasContextRelevancy(model="gpt-4o").run_batch(data=dataset).to_df()

The results will be converted into a data frame, and you can click on the generated link to open the Athina IDE, where you can explore detailed evaluation results and refine your pipeline further.

Conclusion

HyDE significantly improves traditional RAG by introducing hypothetical document embeddings to enhance retrieval quality. Instead of relying solely on query-document similarity, HyDE generates synthetic responses that act as a bridge between queries and real-world knowledge. This approach enables better generalization, zero-shot retrieval, and improved accuracy, making it especially valuable in scenarios where labeled data is scarce.

This guide outlines how to implement HyDE for RAG, leveraging hypothetical document embeddings to improve retrieval without requiring labeled data. For more advanced RAG techniques, check out the GitHub Repository.