Original Paper: https://arxiv.org/abs/2305.05176
By: Lingjiao Chen, Matei Zaharia, James Zou
Abstract:
There is a rapidly growing number of large language models (LLMs) that users can query for a fee. We review the cost associated with querying popular LLM APIs, e.g. GPT-4, ChatGPT, J1-Jumbo, and find that these models have heterogeneous pricing structures, with fees that can differ by two orders of magnitude. In particular, using LLMs on large collections of queries and text can be expensive. Motivated by this, we outline and discuss three types of strategies that users can exploit to reduce the inference cost associated with using LLMs: 1) prompt adaptation 2) LLM approximation 3) LLM cascade. As an example, we propose FrugalGPT, a simple yet flexible instantiation of LLM cascade which learns which combinations of LLMs to use for different queries in order to reduce cost and improve accuracy. Our experiments show that FrugalGPT can match the performance of the best individual LLM (e.g. GPT-4) with up to 98% cost reduction or improve the accuracy over GPT-4 by 4% with the same cost. The ideas and findings presented here lay a foundation for using LLMs sustainably and efficiently.
Summary Notes
Figure: Our vision for reducing LLM cost while improving accuracy. (a) The standard usage sends queries to a single LLM (e.g. GPT-4), which can be expensive. (b) Our proposal is to use prompt adaption, LLM approximation, and LLM cascade to reduce the inference cost. By optimizing the selection of different LLM APIs (e.g., GPT-J, ChatGPT, and GPT-4) as well as prompting strategies (such as zero-shot [BMR+20], few-shot [LSZ+21], and chain-of-thought(CoT) [WWS+22]), we can achieve substantial efficiency gains. (c) On HEADLINES (a financial news dataset), FrugalGPT can reduce the inference cost by 98% while exceeding the performance of the best individual LLM (GPT-4).
Introduction
The advent of Large Language Models (LLMs) like GPT-4 and ChatGPT has revolutionized various sectors, including commerce, science, and finance. However, the operational costs and environmental impact of these models pose significant challenges. For instance, running ChatGPT can cost over $700,000 per day, while using GPT-4 for customer service can cost small businesses upwards of $21,000 a month. To address these issues, researchers from Stanford University have introduced FrugalGPT, a framework designed to reduce the cost of using LLMs while maintaining or even improving their performance.
Key Methodologies
FrugalGPT employs three main strategies to optimize the use of LLMs:
- Prompt Adaptation: This strategy involves crafting concise prompts that reduce the number of tokens required, thereby lowering the cost.
- LLM Approximation: This technique uses more affordable models or infrastructures to approximate the performance of costly LLMs.
- LLM Cascade: This approach adaptively selects which LLM APIs to use for different queries, optimizing both cost and performance.
Strategy Breakdown
1. Prompt Adaptation
The cost of querying an LLM is directly proportional to the length of the prompt. Prompt adaptation aims to minimize this length. For example, instead of using a prompt filled with numerous examples, a smaller subset of relevant examples can be used. Additionally, multiple queries can be concatenated into a single prompt to avoid redundant processing.
2. LLM Approximation
This strategy involves using cheaper models to approximate the performance of more expensive ones. Two key techniques are employed:
- Completion Cache: Stores the response of an LLM to a query and reuses it when a similar query is encountered.
- Model Fine-Tuning: Uses the responses from a powerful LLM to fine-tune a smaller, more affordable model, which can then be used for new queries.
3. LLM Cascade
LLM Cascade leverages the heterogeneous performance and costs of different LLM APIs. It sequentially queries multiple LLMs and uses a scoring function to decide whether to accept a response or query the next LLM in the sequence. This adaptive selection helps to balance cost and accuracy.
Findings and Results
FrugalGPT was evaluated using various datasets, including HEADLINES (financial news), OVERRULING (legal documents), and COQA (reading comprehension). The results were impressive:
- Cost Savings: FrugalGPT reduced the inference cost by up to 98% while maintaining or improving performance.
- Performance Gains: In some cases, FrugalGPT improved accuracy by up to 4% compared to the best individual LLM.
Case Study: HEADLINES Dataset
In a detailed case study on the HEADLINES dataset, FrugalGPT demonstrated its effectiveness. By setting a budget of $6.5 (one-fifth of GPT-4’s cost), FrugalGPT sequentially used GPT-J, J1-L, and GPT-4 based on a scoring function. This approach not only reduced costs by 80% but also improved accuracy by 1.5%.
Implications and Applications
The implications of FrugalGPT are far-reaching:
- Cost Efficiency: Small businesses and startups can leverage powerful LLMs without incurring prohibitive costs.
- Environmental Impact: By reducing the computational resources required, FrugalGPT also helps in lowering the environmental footprint of using LLMs.
- Versatility: The framework can be applied across various industries, from finance and law to customer service and healthcare.
Conclusion
FrugalGPT introduces a groundbreaking approach to using LLMs more sustainably and efficiently. By combining strategies like prompt adaptation, LLM approximation, and LLM cascade, it offers a flexible and cost-effective solution without compromising performance. As LLMs continue to evolve, frameworks like FrugalGPT will be crucial in making these technologies accessible and sustainable for a broader range of applications.
Future Prospects
While FrugalGPT sets a solid foundation, there are still areas for improvement and exploration. Future research could focus on further optimizing the LLM cascade, integrating additional cost-saving techniques, and addressing other critical factors like latency, fairness, and privacy. The continuous development of LLMs will undoubtedly present new challenges and opportunities, driving further innovation in this dynamic field.
Athina AI is a collaborative IDE for AI development.
Learn more about how Athina can help your team ship AI 10x faster →