Original Paper: https://arxiv.org/abs/2407.06023v1
By: Ping Yu, Jing Xu, Jason Weston, Ilia Kulikov
Abstract:
Large language models (LLMs) can spend extra compute during inference to generate intermediate thoughts, which helps to produce better final responses. Since Chain-of-Thought (Wei et al., 2022), many such System 2 techniques have been proposed such as Rephrase and Respond (Deng et al., 2023a), System 2 Attention (Weston and Sukhbaatar, 2023) and Branch-Solve-Merge (Saha et al., 2023). In this work we investigate self-supervised methods to ``compile'' (distill) higher quality outputs from System 2 techniques back into LLM generations without intermediate reasoning token sequences, as this reasoning has been distilled into System 1. We show that several such techniques can be successfully distilled, resulting in improved results compared to the original System 1 performance, and with less inference cost than System 2. We posit that such System 2 distillation will be an important feature of future continually learning AI systems, enabling them to focus System 2 capabilities on the reasoning tasks that they cannot yet do well.
Summary Notes
Figure 1:Overview of System 2 Distillation. Filtered training examples are collected by running System 2 approaches such as Branch-Solve-Merge (BSM) on unlabeled data, which uses extra compute to produce higher quality outputs. These targets are then distilled into the standard (System 1) LLM.
Large Language Models (LLMs) have revolutionized numerous fields with their ability to understand and generate human-like text. However, their performance can be significantly improved by utilizing advanced reasoning techniques, often referred to as System 2 methods. These methods, although effective, come with the cost of increased computational resources. The latest research from Meta FAIR introduces a promising approach to distill these advanced reasoning techniques back into the more efficient, direct-response System 1 models. Let’s delve into the methodologies, results, and implications of this groundbreaking work.
Introduction: Bridging the Gap Between Efficiency and Effectiveness
In the world of LLMs, there's a distinction between System 1 and System 2 reasoning. System 1 refers to the model's ability to generate responses directly from the input without intermediate steps, akin to automatic thinking in humans.
On the other hand, System 2 involves generating intermediate thoughts or steps, similar to deliberate reasoning and planning. While System 2 methods like Chain-of-Thought (CoT), Rephrase and Respond (RaR), System 2 Attention (S2A), and Branch-Solve-Merge (BSM) improve accuracy and quality, they also increase inference costs.
The research explores how to distill these complex System 2 techniques back into System 1 models to achieve the best of both worlds: high performance and low computational expense.
Methodologies: From Advanced Reasoning to Efficient Responses
The distillation process involves several key steps:
- Generating Responses Using System 2 Models: For each input, responses are generated using the advanced System 2 techniques. These responses are then curated using consistency checks to ensure high quality.
- Filtering and Self-Consistency: Techniques like self-consistency of outputs and consistency under input perturbation are employed to filter out low-quality responses.
- Supervised Fine-Tuning: The filtered high-quality responses are used to fine-tune the System 1 model, aiming to replicate the performance of System 2 without generating steps intermediate.
Main Findings and Results
The research applied the distillation methodology across various System 2 techniques and tasks:
1. Rephrase and Respond (RaR)
- Tasks: Last Letter Concatenation and Coin Flip Reasoning.
- Findings:
- Last Letter Concatenation: The distilled System 1 model achieved an impressive 98% accuracy, significantly outperforming the original System 1 model's 30%.
- Coin Flip Reasoning: The distilled model reached a performance of 75.69%, close to the 2-step RaR's 77.2%, with much fewer tokens generated.
2. System 2 Attention (S2A)
- Task: SycophancyEval (handling biased inputs).
- Findings: The distilled System 1 model not only matched but exceeded the performance of System 2 models on biased inputs, achieving 81.3% accuracy compared to 76.0% of the original S2A model.
3. Branch-Solve-Merge (BSM)
- Tasks: Evaluation of LLM responses using OASST2 and MT-bench.
- Findings:
- The distilled System 1 model outperformed both CoT and the original BSM method in terms of human agreement and consistency, with a fraction of the computational cost.
- On MT-bench, the distilled model showed superior performance in multiple categories, surpassing even GPT-4-0125-preview in areas such as writing, math, and STEM.
4. Chain-of-Thought (CoT)
- Task: GSM8k (math problems).
- Findings: Distillation of CoT into System 1 was not as effective, highlighting the challenges in distilling complex reasoning tasks. The distilled model's accuracy was significantly lower than the original CoT performance.
Implications and Potential Applications
The ability to distill System 2 reasoning into System 1 models carries significant implications for the deployment of LLMs:
- Improved Efficiency: Distilled models can achieve high performance with lower computational costs, making them more feasible for real-time applications.
- Broader Accessibility: Efficient models can be deployed on devices with limited resources, democratizing access to advanced AI capabilities.
- Continuous Learning: Future AI systems can continuously learn and distill new tasks, focusing on areas where System 2 reasoning is most needed, akin to human learning processes.
Conclusion: Paving the Way for Efficient AI
The research from Meta FAIR demonstrates that it is possible to achieve the high performance of System 2 reasoning methods within the efficient framework of System 1 models. While challenges remain, particularly with complex reasoning tasks like CoT, the overall findings are promising. This approach not only enhances the efficiency of LLMs but also opens up new avenues for their application in resource-constrained environments. As AI continues to evolve, such innovations will be critical in ensuring that advanced AI capabilities are both powerful and accessible.
Athina AI is a collaborative IDE for AI development.
Learn more about how Athina can help your team ship AI 10x faster →