Original Paper: https://arxiv.org/abs/2309.04379
By: Dongming Wu, Wencheng Han, Tiancai Wang, Yingfei Liu, Xiangyu Zhang, Jianbing Shen
Abstract:
A new trend in the computer vision community is to capture objects of interest following flexible human command represented by a natural language prompt. However, the progress of using language prompts in driving scenarios is stuck in a bottleneck due to the scarcity of paired prompt-instance data. To address this challenge, we propose the first object-centric language prompt set for driving scenes within 3D, multi-view, and multi-frame space, named NuPrompt. It expands Nuscenes dataset by constructing a total of 35,367 language descriptions, each referring to an average of 5.3 object tracks. Based on the object-text pairs from the new benchmark, we formulate a new prompt-based driving task, \ie, employing a language prompt to predict the described object trajectory across views and frames. Furthermore, we provide a simple end-to-end baseline model based on Transformer, named PromptTrack. Experiments show that our PromptTrack achieves impressive performance on NuPrompt. We hope this work can provide more new insights for the autonomous driving community. Dataset and Code will be made public at \href{
Summary Notes
Enhancing Autonomous Driving with Language Prompts: Exploring NuPrompt Dataset and PromptTrack Model
The integration of natural language processing (NLP) and computer vision in autonomous driving is taking a giant leap forward with the introduction of the NuPrompt dataset and the PromptTrack model.
This post examines these advancements and their impact on AI engineering in the automotive industry.
NuPrompt Dataset: Elevating Data for Autonomous Driving
The NuPrompt dataset addresses the shortcomings of current autonomous driving datasets by providing extensive language descriptions for complex driving scenarios.
This new dataset is an extension of the Nuscenes dataset and includes:
- 35,367 language descriptions for a detailed understanding of object interactions in 3D spaces over multiple frames and views.
Source and Content
NuPrompt enriches the Nuscenes dataset with elaborate language descriptions, offering a richer perspective on dynamic driving environments.
Compared to Other Datasets
NuPrompt outshines similar datasets with:
- Multiple object annotations per prompt.
- Capturing dynamic interactions across frames for a more accurate reflection of real-world scenarios.
Advantages
Key benefits of NuPrompt include:
- Richer model training data.
- A new benchmark for language prompt-based tasks in autonomous driving.
Building the PromptTrack Model
PromptTrack is a cutting-edge model designed to fully utilize NuPrompt's data, featuring:
Data Annotation Process
Annotations combine human insight with GPT-3.5's generative abilities, ensuring diverse and accurate scenario descriptions.
Model Architecture
PromptTrack is a Transformer-based model integrating a unique prompt reasoning branch, enhancing trajectory predictions from language prompts.
Cross-Modal Feature Integration
This crucial step allows PromptTrack to interpret natural language within the visual context, moving closer to intuitive autonomous driving systems.
Key Contributions
NuPrompt and PromptTrack offer:
- A benchmark for language prompt tasks.
- Improved object tracking and prediction grounded in language understanding.
Experimental Results
PromptTrack outperforms existing models in Average Multiple Object Tracking Accuracy (AMOTA) and other key metrics, proving the effectiveness of its prompt reasoning capabilities.
Conclusion: Advancing Toward Intuitive Autonomous Driving
The NuPrompt dataset and PromptTrack model enhance autonomous driving technology and human-machine interaction. By merging NLP with visual recognition, they set the stage for vehicles that interact with their surroundings in new and meaningful ways.
Looking Ahead
Future directions include developing algorithms for better temporal and cross-modal reasoning.
The NuPrompt dataset and PromptTrack model are available for AI engineers and researchers on GitHub, providing a foundation for further innovation in autonomous driving.
In summary, the integration of language prompts into autonomous driving through the NuPrompt dataset and PromptTrack model opens new avenues in vehicle intelligence and human-machine communication, marking a significant milestone in the development of truly autonomous vehicles.
Athina AI is a collaborative IDE for AI development.
Learn more about how Athina can help your team ship AI 10x faster →