DoorDash’s Approach to Building a Product Knowledge Graph

Introduction

In today’s fast-paced digital marketplace, maintaining high-quality, accurate product information is critical for ensuring a seamless shopping experience.

DoorDash faced the challenge of efficiently managing and enhancing its retail catalogue data, which included millions of SKUs (stock-keeping units) from diverse merchants.

To overcome this, DoorDash harnessed the power of Large Language Models (LLMs), implementing innovative solutions that transformed their product knowledge graph and significantly improved customer experiences.

Background on DoorDash

DoorDash, a leading food delivery and logistics platform, has expanded its offerings beyond restaurants into new verticals such as groceries, convenience stores, and more.

This expansion introduced the challenge of managing a vast and varied retail catalog, where product data quality directly impacts customer satisfaction.

Ensuring accurate and enriched SKU information became a priority to maintain a top-tier shopping experience.

The Challenge: Standardizing and Enriching SKU Data

With the onboarding of new merchants, DoorDash had to integrate SKU data that often varied in format and quality.

Traditionally, this process involved manual enrichment of product attributes, which was time-consuming, costly, and prone to errors.

As the catalogue grew, it became clear that a scalable, automated solution was needed to standardize and enrich this data efficiently.

Innovative Solution: Leveraging Large Language Models

DoorDash turned to LLMs, such as OpenAI’s GPT-4, to automate and enhance SKU data processing.

These models excel at named-entity recognition (NER) and attribute extraction, crucial for understanding unstructured product data.

The LLM-powered system was implemented across three key projects:

Brand Extraction: DoorDash developed an LLM-based pipeline to automatically identify and categorize new brands from product descriptions. This proactive approach not only improved the accuracy of brand tagging but also streamlined the ingestion of new brands into their knowledge graph.
Organic Product Labeling: Recognizing the growing consumer demand for organic products, DoorDash created an LLM-driven model to label items as organic. This model leveraged string matching, LLM reasoning, and external data searches, significantly boosting the accuracy and coverage of organic product labels.
Generalized Attribute Extraction: For entity resolution, DoorDash used LLMs combined with Retrieval-Augmented Generation (RAG) techniques to extract accurate product attributes. This enabled faster and more reliable entity matching, essential for building a unified global catalogue.

Results and Impact

The implementation of LLMs at DoorDash resulted in a marked improvement in the quality and completeness of their product catalogue.

The automated attribute extraction process reduced the time and cost associated with manual data entry, while also increasing the accuracy of product information.

This, in turn, enhanced personalized recommendations, improved substitution options during checkout, and boosted customer satisfaction metrics.

Key Takeaways

DoorDash’s innovative use of LLMs to manage and enhance its product catalog showcases the transformative potential of AI in retail.

By automating the extraction and enrichment of product attributes, DoorDash not only improved operational efficiency but also delivered a better shopping experience for its customers.

As they continue to explore multimodal LLMs and democratize AI usage within the company, DoorDash is setting a new standard for how technology can drive business success in the digital age.

💡

Reference: DoorDash Engineering Blog