RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation

RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation
Photo by fabio / Unsplash


Original Paper: https://arxiv.org/abs/2408.02545

By: Daniel FleischerMoshe BerchanskyMoshe WasserblatPeter Izsak

Abstract:

Implementing Retrieval-Augmented Generation (RAG) systems is inherently complex, requiring deep understanding of data, use cases, and intricate design decisions.

Additionally, evaluating these systems presents significant challenges, necessitating assessment of both retrieval accuracy and generative quality through a multi-faceted approach.

We introduce RAG Foundry, an open-source framework for augmenting large language models for RAG use cases.

RAG Foundry integrates data creation, training, inference and evaluation into a single workflow, facilitating the creation of data-augmented datasets for training and evaluating large language models in RAG settings.

This integration enables rapid prototyping and experimentation with various RAG techniques, allowing users to easily generate datasets and train RAG models using internal or specialized knowledge sources.

We demonstrate the framework effectiveness by augmenting and fine-tuning Llama-3 and Phi-3 models with diverse RAG configurations, showcasing consistent improvements across three knowledge-intensive datasets. Code is released as open-source in this https URL

Summary Notes

In the vast and rapidly evolving landscape of AI, Large Language Models (LLMs) like GPT-3 and Llama-3 have emerged as transformative forces.

They demonstrate impressive capabilities across a wide range of tasks traditionally requiring human intelligence.

However, these models aren't without limitations—they can produce convincing but incorrect answers, struggle with factual accuracy, and lack up-to-date information post-training.

To address these challenges, Retrieval-Augmented Generation (RAG) systems offer a promising solution by integrating external information retrieval mechanisms to enhance the performance of LLMs.

In this blog post, we'll explore RAG Foundry, a robust open-source framework developed by Intel Labs that aims to streamline the implementation of RAG systems. We'll delve into its methodologies, key findings, and the potential implications of this innovative framework.

Introducing RAG Foundry

RAG Foundry is designed to simplify the complex process of integrating retrieval mechanisms with large language models.

It provides a comprehensive, end-to-end workflow that includes data creation, training, inference, and evaluation. This modular approach facilitates rapid prototyping and experimentation with various RAG techniques, enabling users to generate datasets and train RAG models using both internal and specialized knowledge sources.

The framework's capabilities are demonstrated through experiments involving the augmentation and fine-tuning of Llama-3 and Phi-3 models, showcasing consistent improvements across three knowledge-intensive datasets.

Key Methodologies

RAG Foundry integrates several critical components into its workflow:

  1. Data Creation and Processing: The processing module facilitates the creation of context-enhanced datasets, essential for RAG-oriented training and inference. It supports various steps such as dataset loading, data aggregation, information retrieval, prompt creation, and data pre-processing. The modular design allows for flexible and efficient data processes tailored to RAG training and inference.
  2. Copy
  3. Training: The training module fine-tunes models using datasets created by the processing module. It relies on the established training framework TRL and supports advanced techniques like Low-Rank Adaptation (LoRA).
train:
  gradient_accumulation_steps: 4
  learning_rate: 2e-05
  lr_scheduler_type: "cosine"
  num_train_epochs: 1
  optim: "paged_adamw_8bit"
instruction: prompts/prompt_instructions/qa.txt
data_file: TQA_train_processed.jsonl
steps:
  - _target_: dataset_loaders.loaders.HFLoader
    inputs: main
    dataset_config:
      path: "Tevatron/wikipedia-trivia"
      split: train
  - _target_: global_steps.sampling.ShuffleSelect
    inputs: main
    shuffle: 42
    limit: 10000
  - _target_: local_steps.retrievers.HaystackRetriever
    inputs: main
    pipeline_path: configs/qdrant.yaml
    query_key: query
    docs_key: positive_passages
  - _target_: local_steps.prompter.TextPrompter
    inputs: main
    prompt_file: prompts/basic.txt
    output_key: my_prompt
    mapping:
      question: query
      context: positive_passages
      fewshot: fewshot_examples
      answer: answers
  - _target_: global_steps.output.OutputData
    inputs: main
    file_name: TQA_train_processed.jsonl
  1. Inference: This module generates predictions given the processed datasets. Inference is separated from evaluation due to its computational demands, allowing multiple evaluations on a single inference result.
  2. Copy
  3. Evaluation: The evaluation module runs collections of metrics to evaluate RAG techniques and tuning processes. It supports various metrics, including Exact Match (EM), F1, BERTScore, Semantic Similarity, and custom metrics.
metrics:
  - _target_: ragfoundry.evaluation.HFEvaluate
  - _target_: ragfoundry.evaluation.EM
  - _target_: ragfoundry.evaluation.F1
  - _target_: ragfoundry.evaluation.BERTScore
results_file: my-evaluation.yaml
generated_file: model-prediction.jsonl
data_file: my-processed-data.jsonl
model:
  _target_: ragfoundry.models.hf.HFInference
  model_name_or_path: "microsoft/Phi-3-mini-128k-instruct"
  load_in_8bit: true
  instruction: prompts/prompt_instructions/qa.txt
generation:
  do_sample: false
  max_new_tokens: 50
  return_full_text: false
data_file: my-processed-data.jsnol
generated_file: model-predictions.jsonl

Main Findings and Results

The experiments conducted using RAG Foundry demonstrate significant improvements in model performance across three datasets: TriviaQA, PubmedQA, and ASQA. Key findings include:

  • Enhanced Accuracy: Introducing top-relevant documents in a consistent prompt template format significantly improves the accuracy of generated responses.
  • Fine-Tuning Benefits: Fine-tuning the RAG settings further enhances performance, especially when incorporating Chain-of-Thought (CoT) reasoning.
  • Model-Specific Best Methods: The best RAG augmentation technique varies depending on the model and dataset, highlighting the importance of tailored configurations.

Implications and Applications

The implications of these findings are profound for the development and deployment of AI systems:

  • Reduced Hallucinations: By leveraging external knowledge bases, RAG systems can mitigate the issue of hallucinations in LLMs, leading to more reliable and accurate outputs.
  • Cost Efficiency: Enhanced LLMs with retrieval mechanisms can be more cost-efficient, as they require less computational power to achieve high performance.
  • Interpretability and Relevance: RAG systems improve the relevance of generated content and provide a level of interpretability by explicitly showing the retrieved context used for generating responses.

Conclusion

RAG Foundry stands out as a highly customizable and powerful framework for enhancing large language models through retrieval-augmented generation.

Its modular design and comprehensive workflow make it an invaluable tool for researchers and practitioners aiming to push the boundaries of AI capabilities.

By facilitating rapid prototyping and experimentation, RAG Foundry helps bridge the gap between cutting-edge research and practical applications, driving forward the evolution of intelligent systems.


For those interested in exploring RAG Foundry, the code is available as open-source on GitHub. Whether you're looking to improve the factual accuracy of your models or experiment with new RAG techniques, this framework provides a robust foundation to achieve your goals.

Read more