Original Paper: https://arxiv.org/abs/2408.02545
By: Daniel Fleischer, Moshe Berchansky, Moshe Wasserblat, Peter Izsak
Abstract:
Implementing Retrieval-Augmented Generation (RAG) systems is inherently complex, requiring deep understanding of data, use cases, and intricate design decisions. Additionally, evaluating these systems presents significant challenges, necessitating assessment of both retrieval accuracy and generative quality through a multi-faceted approach. We introduce RAG Foundry, an open-source framework for augmenting large language models for RAG use cases. RAG Foundry integrates data creation, training, inference and evaluation into a single workflow, facilitating the creation of data-augmented datasets for training and evaluating large language models in RAG settings. This integration enables rapid prototyping and experimentation with various RAG techniques, allowing users to easily generate datasets and train RAG models using internal or specialized knowledge sources. We demonstrate the framework effectiveness by augmenting and fine-tuning Llama-3 and Phi-3 models with diverse RAG configurations, showcasing consistent improvements across three knowledge-intensive datasets. Code is released as open-source in this https URL
Summary Notes
In the vast and rapidly evolving landscape of AI, Large Language Models (LLMs) like GPT-3 and Llama-3 have emerged as transformative forces. They demonstrate impressive capabilities across a wide range of tasks traditionally requiring human intelligence.
However, these models aren't without limitations—they can produce convincing but incorrect answers, struggle with factual accuracy, and lack up-to-date information post-training. To address these challenges, Retrieval-Augmented Generation (RAG) systems offer a promising solution by integrating external information retrieval mechanisms to enhance the performance of LLMs.
In this blog post, we'll explore RAG Foundry, a robust open-source framework developed by Intel Labs that aims to streamline the implementation of RAG systems. We'll delve into its methodologies, key findings, and the potential implications of this innovative framework.
Introducing RAG Foundry
RAG Foundry is designed to simplify the complex process of integrating retrieval mechanisms with large language models.
It provides a comprehensive, end-to-end workflow that includes data creation, training, inference, and evaluation. This modular approach facilitates rapid prototyping and experimentation with various RAG techniques, enabling users to generate datasets and train RAG models using both internal and specialized knowledge sources.
The framework's capabilities are demonstrated through experiments involving the augmentation and fine-tuning of Llama-3 and Phi-3 models, showcasing consistent improvements across three knowledge-intensive datasets.
Key Methodologies
RAG Foundry integrates several critical components into its workflow:
- Data Creation and Processing: The processing module facilitates the creation of context-enhanced datasets, essential for RAG-oriented training and inference. It supports various steps such as dataset loading, data aggregation, information retrieval, prompt creation, and data pre-processing. The modular design allows for flexible and efficient data processes tailored to RAG training and inference.
steps:
- _target_: dataset_loaders.loaders.HFLoader
inputs: main
dataset_config:
path: "Tevatron/wikipedia-trivia"
split: train
- _target_: global_steps.sampling.ShuffleSelect
inputs: main
shuffle: 42
limit: 10000
- _target_: local_steps.retrievers.HaystackRetriever
inputs: main
pipeline_path: configs/qdrant.yaml
query_key: query
docs_key: positive_passages
- _target_: local_steps.prompter.TextPrompter
inputs: main
prompt_file: prompts/basic.txt
output_key: my_prompt
mapping:
question: query
context: positive_passages
fewshot: fewshot_examples
answer: answers
- _target_: global_steps.output.OutputData
inputs: main
file_name: TQA_train_processed.jsonl
- Training: The training module fine-tunes models using datasets created by the processing module. It relies on the established training framework TRL and supports advanced techniques like Low-Rank Adaptation (LoRA).
train:
gradient_accumulation_steps: 4
learning_rate: 2e-05
lr_scheduler_type: "cosine"
num_train_epochs: 1
optim: "paged_adamw_8bit"
instruction: prompts/prompt_instructions/qa.txt
data_file: TQA_train_processed.jsonl
- Inference: This module generates predictions given the processed datasets. Inference is separated from evaluation due to its computational demands, allowing multiple evaluations on a single inference result.
model:
_target_: ragfoundry.models.hf.HFInference
model_name_or_path: "microsoft/Phi-3-mini-128k-instruct"
load_in_8bit: true
instruction: prompts/prompt_instructions/qa.txt
generation:
do_sample: false
max_new_tokens: 50
return_full_text: false
data_file: my-processed-data.jsnol
generated_file: model-predictions.jsonl
- Evaluation: The evaluation module runs collections of metrics to evaluate RAG techniques and tuning processes. It supports various metrics, including Exact Match (EM), F1, BERTScore, Semantic Similarity, and custom metrics.
metrics:
- _target_: ragfoundry.evaluation.HFEvaluate
- _target_: ragfoundry.evaluation.EM
- _target_: ragfoundry.evaluation.F1
- _target_: ragfoundry.evaluation.BERTScore
results_file: my-evaluation.yaml
generated_file: model-prediction.jsonl
data_file: my-processed-data.jsonl
Main Findings and Results
The experiments conducted using RAG Foundry demonstrate significant improvements in model performance across three datasets: TriviaQA, PubmedQA, and ASQA. Key findings include:
- Enhanced Accuracy: Introducing top-relevant documents in a consistent prompt template format significantly improves the accuracy of generated responses.
- Fine-Tuning Benefits: Fine-tuning the RAG settings further enhances performance, especially when incorporating Chain-of-Thought (CoT) reasoning.
- Model-Specific Best Methods: The best RAG augmentation technique varies depending on the model and dataset, highlighting the importance of tailored configurations.
Implications and Applications
The implications of these findings are profound for the development and deployment of AI systems:
- Reduced Hallucinations: By leveraging external knowledge bases, RAG systems can mitigate the issue of hallucinations in LLMs, leading to more reliable and accurate outputs.
- Cost Efficiency: Enhanced LLMs with retrieval mechanisms can be more cost-efficient, as they require less computational power to achieve high performance.
- Interpretability and Relevance: RAG systems improve the relevance of generated content and provide a level of interpretability by explicitly showing the retrieved context used for generating responses.
Conclusion
RAG Foundry stands out as a highly customizable and powerful framework for enhancing large language models through retrieval-augmented generation. Its modular design and comprehensive workflow make it an invaluable tool for researchers and practitioners aiming to push the boundaries of AI capabilities. By facilitating rapid prototyping and experimentation, RAG Foundry helps bridge the gap between cutting-edge research and practical applications, driving forward the evolution of intelligent systems.
For those interested in exploring RAG Foundry, the code is available as open-source on GitHub. Whether you're looking to improve the factual accuracy of your models or experiment with new RAG techniques, this framework provides a robust foundation to achieve your goals.
Athina AI is a collaborative IDE for AI development.
Learn more about how Athina can help your team ship AI 10x faster →