RAFT: Adapting Language Model to Domain Specific RAG
Original Paper: https://arxiv.org/abs/2403.10131
By: Tianjun Zhang, Shishir G. Patil, Naman Jain, Sheng Shen, Matei Zaharia, Ion Stoica, Joseph E. Gonzalez
Abstract:
Pretraining Large Language Models (LLMs) on large corpora of textual data is now a standard paradigm.
When using these LLMs for many downstream applications, it is common to additionally bake in new knowledge (e.g., time-critical news, or private domain knowledge) into the pretrained model either through RAG-based-prompting, or fine-tuning.
However, the optimal methodology for the model to gain such new knowledge remains an open question. In this paper, we present Retrieval Augmented FineTuning (RAFT), a training recipe that improves the model's ability to answer questions in a "open-book" in-domain settings.
In RAFT, given a question, and a set of retrieved documents, we train the model to ignore those documents that don't help in answering the question, which we call, distractor documents.
RAFT accomplishes this by citing verbatim the right sequence from the relevant document that would help answer the question.
This coupled with RAFT's chain-of-thought-style response helps improve the model's ability to reason.
In domain-specific RAG, RAFT consistently improves the model's performance across PubMed, HotpotQA, and Gorilla datasets, presenting a post-training recipe to improve pre-trained LLMs to in-domain RAG.
RAFT's code and demo are open-sourced at this http URL
Summary Notes
Figure: How best to prepare for an Exam?
(a) Fine-tuning based approaches implement "studying" by either directly "memorizing" the input documents or answering practice QA without referencing the documents.
(b) Alternatively, in-context retrieval methods fail to leverage the learning opportunity afforded by the fixed domain and are equivalent to taking an open-book exam without studying. In contrast, our approach
(c) RAFT leverages fine-tuning with question-answer pairs while referencing the documents in a simulated imperfect retrieval setting — thereby effectively preparing for the open-book exam setting.
Introduction
In the realm of artificial intelligence and natural language processing, Large Language Models (LLMs) have become the de facto powerhouse, demonstrating remarkable prowess across a plethora of general knowledge tasks.
However, as applications extend into more specialized domains, the challenge of tailoring these models to perform optimally in specific contexts has become increasingly pronounced.
Enter RAFT: Retrieval Augmented Fine Tuning, a novel approach designed to refine LLMs for domain-specific tasks by leveraging retrieval-augmented generation (RAG).
This methodology not only enhances the model's ability to access and utilize domain-specific knowledge but also improves its robustness against irrelevant data—a crucial capability in "open-book" test settings.
Methodologies: How RAFT Works
RAFT distinguishes itself from traditional fine-tuning and RAG methodologies by integrating the strengths of both. Conventional RAG allows models to incorporate additional information from documents during testing, akin to having an open-book exam without prior studying.
On the other hand, standard fine-tuning trains models to memorize information or practice without referencing context, akin to studying for a closed-book exam. RAFT introduces a hybrid approach:
- Incorporation of Distractor Documents: RAFT trains models with both relevant ("golden") documents and irrelevant ("distractor") documents, simulating real-world scenarios where not all retrieved information is pertinent.
- Chain-of-Thought Reasoning: Answers are generated in a chain-of-thought style, ensuring that models not only provide answers but also articulate the reasoning process, making them more robust against distractors.
- Variable Document Contexts: RAFT includes a mix of training data—some with golden documents and some without. This variability enhances the model's ability to handle both known and unknown contexts.
Main Findings and Results
RAFT has been evaluated across multiple datasets, including PubMed, HotpotQA, and APIBench, yielding impressive results:
- Performance Boost: Across various datasets, RAFT consistently outperforms traditional domain-specific fine-tuned models and even larger models like GPT-3.5 in specific domains.
For instance, on the HotpotQA dataset, RAFT achieves a significant improvement, highlighting its effectiveness in handling domain-specific questions. - Robustness to Distractors: RAFT-trained models demonstrate enhanced resilience to irrelevant information, maintaining accuracy even when presented with multiple distractor documents during testing.
- Improved Generalization: The approach shows that by training with a mix of golden and distractor documents, models can generalize better to scenarios where the context may vary significantly, a common occurrence in real-world applications.
Implications and Applications
The implications of RAFT are profound, particularly for industries and domains where specificity and accuracy are paramount. From legal document analysis to medical research, RAFT enables LLMs to be more adept at navigating complex and domain-specific information landscapes.
This adaptability opens up new possibilities for AI applications, such as:
- Healthcare: Enhancing diagnostic support systems by accurately interpreting vast amounts of medical literature.
- Legal Tech: Assisting in legal research by effectively navigating and retrieving relevant case laws and statutes.
- Enterprise Solutions: Streamlining information retrieval processes within corporate document repositories, improving decision-making efficiency.
Conclusion
RAFT represents a significant stride forward in the quest to tailor large language models to specific domains.
By marrying the best of fine-tuning and retrieval-augmented generation, RAFT not only enhances the accuracy and robustness of language models but also broadens their applicability across specialized fields.
As AI continues to permeate more aspects of professional and personal life, approaches like RAFT will be crucial in ensuring these systems are not only powerful but also context-aware and reliable.
As we look to the future, further exploration into the nuances of retrieval augmented training promises to unlock even greater potential in the realm of intelligent information processing.