Introduction
Dropbox has introduced features that use large language models to generate summaries and answer questions about files when they are previewed on the web.
This technology aims to reduce the need for users to manually sift through large volumes of information, providing instant insights and answers.
The Challenge: Information Overload
Today’s knowledge workers often face the daunting task of navigating through vast amounts of information contained in various file types—documents, videos, audio files, and more.
Traditionally, finding specific information within these files required manual effort, and summarizing the content could be particularly time-consuming.
Dropbox recognized the need for a more efficient solution that could help users quickly extract the essence of their files and find specific information without having to go through the entire content.
The Solution: Leveraging Large Language Models
Dropbox’s solution involves employing large language models (LLMs) to transform how file previews are handled.
These models work by converting the file content into numerical representations, which can then be compared against input queries and an internal corpus of knowledge.
This semantic comparison allows the system to understand and process information in a way that mimics human comprehension.
Extracting Text and Embeddings with Riviera
At the heart of this solution is Dropbox’s Riviera framework. Riviera is a sophisticated system that converts various file types into text, enabling easier manipulation and analysis by web browsers and machine learning models.
The framework supports conversions between roughly 300 file types, processing around 2.5 billion requests per day.
A crucial aspect of this framework is its ability to cache intermediate states of file conversions, which significantly enhances efficiency.
When a file is uploaded, Riviera converts it into text and subsequently generates embeddings, which are mathematical representations of the text’s semantic meaning.
These embeddings are then used to create summaries and provide answers to user queries.
The Summarization Plugin
A key feature offered by Dropbox is the ability to summarize large files—be it documents, videos, or other media. The summarization process involves several steps:
- Text Extraction: Text chunks are extracted from the file.
- Embedding Generation: Embeddings are created for each chunk.
- Clustering: K-means clustering groups similar chunks based on their embeddings.
- Context Creation: Representative chunks from each cluster are combined to form a context.
- Summary Generation: An LLM generates a summary based on this context.
This method ensures that the summary covers a diverse range of topics present in the original file, reducing redundancy and the likelihood of hallucinations.
The Q&A Plugin
The Q&A feature allows users to ask questions about the contents of their files.
The process is similar to summarization but focuses on finding chunks that are most relevant to the user’s query:
- Query Embedding: An embedding is generated for the user’s question.
- Relevance Calculation: The system calculates the distance between the query embedding and embeddings from the file chunks.
- Contextual Answer Generation: The most relevant chunks are sent to the LLM, which generates an answer.
Additionally, the system can suggest follow-up questions to help users dig deeper into the file content.
Expanding to Multiple Files
Initially, these AI-powered features were limited to single files. However, Dropbox aimed to extend this functionality to collections of files.
This required significant updates to the Riviera framework and the development of algorithms capable of determining which chunks from multiple files are most relevant to a given query.
Through strategic use of power law dynamics, Dropbox was able to fine-tune the number of chunks sent to the LLM based on the nature of the question—whether it was broad or direct.
Lessons Learned and Key Takeaways
Building these machine learning capabilities involved overcoming several technical challenges and making strategic decisions:
- Real-time Processing: To prioritize user privacy and security, Dropbox opted for real-time computation of embeddings and AI responses.
- Segmentation and Clustering: Focusing on the most relevant parts of a file improved both the quality and efficiency of the summaries and answers.
- Chunk Priority Calculation: Prioritizing chunks based on relevance ensured that the most important information was included in the AI responses.
- Embracing Embeddings: This enabled more accurate and efficient handling of multi-file queries.
- Cached Embeddings: Caching reduced redundant API calls, improving performance and reducing costs.
The results speak for themselves: the cost-per-summary dropped by 93%, and the cost-per-query decreased by 64%.
Latency for summaries and queries saw dramatic reductions, making these features not only more affordable for Dropbox but also more responsive for users.
Conclusion
Dropbox's AI-powered file previews are a game-changer in how users interact with and pull insights from their files.
Using advanced machine learning algorithms, Dropbox makes it easy for users to get quick summaries and answers, cutting through the noise and boosting productivity.
This case study highlights how AI can simplify complex tasks and deliver real-time, actionable insights.