Original Paper: https://arxiv.org/abs/2409.07429
By: Zora Zhiruo Wang, Jiayuan Mao, Daniel Fried, Graham Neubig
Abstract:
Despite the potential of language model-based agents to solve real-world tasks such as web navigation, current methods still struggle with long-horizon tasks with complex action trajectories. In contrast, humans can flexibly solve complex tasks by learning reusable task workflows from past experiences and using them to guide future actions. To build agents that can similarly benefit from this process, we introduce Agent Workflow Memory (AWM), a method for inducing commonly reused routines, i.e., workflows, and selectively providing workflows to the agent to guide subsequent generations. AWM flexibly applies to both offline and online scenarios, where agents induce workflows from training examples beforehand or from test queries on the fly. We experiment on two major web navigation benchmarks -- Mind2Web and WebArena -- that collectively cover 1000+ tasks from 200+ domains across travel, shopping, and social media, among others. AWM substantially improves the baseline results by 24.6% and 51.1% relative success rate on Mind2Web and WebArena while reducing the number of steps taken to solve WebArena tasks successfully. Furthermore, online AWM robustly generalizes in cross-task, website, and domain evaluations, surpassing baselines from 8.9 to 14.0 absolute points as train-test task distribution gaps widen.
Summary Notes
Figure: AWM enables agents to continuously induce and apply workflows to improve performance, compared to stagnant baselines. We show results by AWM on the WebArena map split as an example.
Introduction
In the ever-evolving landscape of AI and language models, one of the most challenging tasks is enabling agents to navigate the web efficiently. Despite significant advancements, current AI agents still struggle with long-horizon tasks that require complex action sequences. Enter Agent Workflow Memory (AWM), a groundbreaking approach designed to equip agents with the ability to learn, adapt, and reuse workflows, much like humans do. This blog post delves into the intriguing research behind AWM, its methodologies, significant findings, and potential applications.
Key Methodologies
The core idea of AWM is to induce and utilize workflows effectively. Here's a breakdown of the key methodologies:
- Workflow Induction:
- Offline Induction: When annotated examples are available, AWM extracts reusable workflows from these examples.
- Online Induction: In scenarios where annotated examples are not available, AWM iteratively induces workflows from self-generated predictions judged correct by an evaluator module.
- Workflow Representation:
- Workflows are represented as a combination of a textual description and a series of steps (actions and observations), making them easy for the agent to understand and execute.
- Integration with Agent Memory:
- Induced workflows are integrated into the agent's memory, allowing the agent to recall and apply these workflows in future tasks.
- Evaluation:
- AWM was evaluated on two major web navigation benchmarks, WebArena and Mind2Web, which cover a wide range of tasks across different domains.
Main Findings and Results
The results from the experiments conducted using AWM are nothing short of impressive:
- WebArena:
- AWM achieved a 51.1% relative increase in task success rate over the top autonomous method, significantly outperforming methods that use human-engineered workflows.
- It also demonstrated superior performance across all tested websites, with improvements ranging from 11.8 to 30.7 absolute points.
- Mind2Web:
- AWM showed a 24.6% relative increase in step-wise success rate in cross-task evaluations.
- In terms of cross-website and cross-domain generalization, AWM surpassed baseline methods by 8.9 to 14.0 absolute points.
Implications and Potential Applications
The implications of AWM are profound, promising significant advancements in various real-world applications:
- Enhanced Web Navigation:
- By leveraging reusable workflows, AWM enables more efficient and accurate web navigation, making it invaluable for applications like automated customer service, online shopping assistants, and digital personal assistants.
- Scalability and Adaptability:
- The ability to induce workflows online means AWM can adapt to new tasks and environments without requiring extensive retraining or annotated examples, making it highly scalable and adaptable.
- Improved User Experience:
- As AWM-driven agents become more proficient at web navigation and task execution, they can provide a smoother and more intuitive user experience, enhancing user satisfaction and engagement.
Conclusion
Agent Workflow Memory represents a significant leap forward in the field of AI-driven web navigation. By enabling agents to learn and adapt through reusable workflows, AWM not only enhances task success rates but also demonstrates impressive generalization capabilities across various domains. This innovative approach holds the potential to transform how AI agents interact with the web, paving the way for more efficient, scalable, and user-friendly applications.
As we continue to explore the possibilities of AWM, future research could focus on refining workflow induction methods, improving generalization across more diverse tasks, and integrating real-time state access for even more dynamic and flexible agent behavior. The journey of AWM has just begun, and its impact on the future of AI and web navigation is poised to be monumental.
Athina AI is a collaborative IDE for AI development.
Learn more about how Athina can help your team ship AI 10x faster →