Original Paper: https://arxiv.org/pdf/2407.20183
By: Zehui Chen, Kuikun Liu, Qiuchen Wang, Jiangning Liu, Wenwei Zhang, Kai Chen, Feng Zhao
Abstract:
Information seeking and integration is a complex cognitive task that consumes enormous time and effort. Inspired by the remarkable progress of Large Language Models, recent works attempt to solve this task by combining LLMs and search engines. However, these methods still obtain unsatisfying performance due to three challenges: (1) complex requests often cannot be accurately and completely retrieved by the search engine once (2) corresponding information to be integrated is spread over multiple web pages along with massive noise, and (3) a large number of web pages with long contents may quickly exceed the maximum context length of LLMs. Inspired by the cognitive process when humans solve these problems, we introduce MindSearch to mimic the human minds in web information seeking and integration, which can be instantiated by a simple yet effective LLM-based multi-agent framework. The WebPlanner models the human mind of multi-step information seeking as a dynamic graph construction process: it decomposes the user query into atomic sub-questions as nodes in the graph and progressively extends the graph based on the search result from WebSearcher. Tasked with each sub-question, WebSearcher performs hierarchical information retrieval with search engines and collects valuable information for WebPlanner. The multi-agent design of MindSearch enables the whole framework to seek and integrate information parallelly from larger-scale (e.g., more than 300) web pages in 3 minutes, which is worth 3 hours of human effort. MindSearch demonstrates significant improvement in the response quality in terms of depth and breadth, on both close-set and open-set QA problems. Besides, responses from MindSearch based on InternLM2.5-7B are preferable by humans to ChatGPT-Web and Perplexity.ai applications, which implies that MindSearch can already deliver a competitive solution to the proprietary AI search engine.
Summary Notes
Figure. The overall framework of MindSearch. It consists of two main ingredients: WebPlanner and WebSearcher. WebPlanner acts as a high-level planner, orchestrating the reasoning steps and multiple WebSearchers. WebSearcher conducts fine-grained web searches and summarizes valuable information back to the planner, formalizing a simple yet effective multi-agent framework.
Introducing MindSearch
MindSearch is a novel framework designed to mimic the human process of seeking and integrating web information. This system consists of two main components:
- WebPlanner: Acts as a high-level planner, breaking down complex queries into simpler sub-questions.
- WebSearcher: Conducts detailed web searches and aggregates valuable information, feeding it back to the WebPlanner.
Together, these components operate in a multi-agent framework, enabling parallel processing of information from over 300 web pages in just 3 minutes—a task that would take a human approximately three hours.
Key Methodologies
WebPlanner: Decomposing Queries into Manageable Parts
WebPlanner takes the user’s complex query and decomposes it into smaller, more manageable sub-questions.
This process is modeled as a directed acyclic graph (DAG), where each node represents an independent web search, and edges define the relationships between these searches.
The graph starts with the initial user query and culminates in the final comprehensive answer.
To facilitate this, WebPlanner leverages the code generation capabilities of LLMs. By using predefined atomic code functions, it dynamically constructs the graph, progressively decomposing the query and dispatching sub-questions to WebSearcher agents.
WebSearcher: Hierarchical Information Retrieval
WebSearcher operates as a sophisticated Retrieve-and-Generate (RAG) agent. It begins by generating multiple queries related to the sub-question assigned by WebPlanner.
These queries are executed through various search APIs like Google, Bing, and DuckDuckGo. The results are then merged, and the most valuable pages are selected for detailed reading.
This hierarchical retrieval method reduces the cognitive load on LLMs, allowing them to focus on extracting highly relevant information efficiently.
Main Findings and Results
MindSearch has been extensively evaluated using both closed-set and open-set Question Answering (QA) tasks, showcasing significant improvements in response quality:
- Depth and Breadth: MindSearch provides deeper, more comprehensive answers compared to existing solutions like ChatGPT-Web and Perplexity.ai.
- Efficiency: It processes information from over 300 web pages in less than 3 minutes, outperforming human experts in terms of speed and cognitive workload.
In subjective evaluations, human judges preferred responses generated by MindSearch, highlighting its capability to deliver detailed and nuanced answers.
Performance Highlights
- Closed-Set QA: MindSearch significantly outperformed baseline models in tasks like Bamboogle, Musique, and HotpotQA.
- Open-Set QA: Human evaluators rated MindSearch responses higher in terms of depth and breadth, although improvements in factual accuracy are still needed.
Implications and Applications
The implications of MindSearch are vast, particularly for fields that require deep and comprehensive information retrieval:
- Research and Academia: Facilitating more efficient literature reviews and data gathering.
- Business Intelligence: Enhancing market analysis and competitive intelligence by aggregating and analyzing vast amounts of data.
- Healthcare: Assisting medical professionals in integrating research findings with clinical data for better decision-making.
While MindSearch is already delivering promising results, there are areas for future research. Enhancing the factual accuracy of responses and further refining the multi-agent framework could unlock even greater potential.
Conclusion
MindSearch represents a significant leap forward in AI-driven search technology, combining the reasoning capabilities of LLMs with the extensive data retrieval power of search engines.
By mimicking human cognitive processes, it provides a more nuanced and efficient approach to information seeking and integration.
As this technology continues to evolve, it promises to transform how we access and utilize information in myriad fields.
With innovative frameworks like MindSearch, we are one step closer to creating AI systems that truly understand and cater to complex human queries, making the vast resources of the web more accessible and useful than ever before.
Athina AI is a collaborative IDE for AI development.
Learn more about how Athina can help your team ship AI 10x faster →