Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation

Medical Graph RAG: Towards Safe Medical Large Language Model via Graph Retrieval-Augmented Generation
Photo by fabio / Unsplash


Original Paper: https://arxiv.org/abs/2408.04187

By: Junde WuJiayuan ZhuYunli Qi

Abstract:

We introduce a novel graph-based Retrieval-Augmented Generation (RAG) framework specifically designed for the medical domain, called \textbf{MedGraphRAG}, aimed at enhancing Large Language Model (LLM) capabilities and generating evidence-based results, thereby improving safety and reliability when handling private medical data.

Our comprehensive pipeline begins with a hybrid static-semantic approach to document chunking, significantly improving context capture over traditional methods.

Extracted entities are used to create a three-tier hierarchical graph structure, linking entities to foundational medical knowledge sourced from medical papers and dictionaries.

These entities are then interconnected to form meta-graphs, which are merged based on semantic similarities to develop a comprehensive global graph.

This structure supports precise information retrieval and response generation. The retrieval process employs a U-retrieve method to balance global awareness and indexing efficiency of the LLM.

Our approach is validated through a comprehensive ablation study comparing various methods for document chunking, graph construction, and information retrieval.

The results not only demonstrate that our hierarchical graph construction method consistently outperforms state-of-the-art models on multiple medical Q\&A benchmarks, but also confirms that the responses generated include source documentation, significantly enhancing the reliability of medical LLMs in practical applications.

Summary Notes

image

Figure 1:MedGraphRAG framework.

The rapid development of Large Language Models (LLMs) like OpenAI's ChatGPT has revolutionized natural language processing, enabling a myriad of AI-driven applications. However, these models face significant challenges in specialized fields such as medicine, where precision and evidence-based outputs are crucial. Enter MedGraphRAG, a novel method designed to enhance LLM capabilities in the medical domain by incorporating a graph-based Retrieval-Augmented Generation (RAG) framework. This approach promises to boost the safety, reliability, and interpretability of LLMs when handling sensitive medical information.

Introduction to MedGraphRAG

MedGraphRAG aims to address two critical challenges faced by LLMs in the medical field:

  1. Hallucinations: LLMs may produce outputs that appear accurate but are factually incorrect, which can lead to dangerous conclusions in medical contexts.
  2. Simplistic Reasoning: LLMs often fail to provide the deep, nuanced reasoning required for complex medical queries.

MedGraphRAG enhances LLMs by generating evidence-based responses with grounded source citations and clear interpretations of medical terminology.

This method involves constructing a three-tier hierarchical graph that links entities extracted from user-provided documents to foundational medical knowledge, ensuring precise information retrieval and response generation.

Methodology Overview

1. Document Chunking

Traditional methods of segmenting medical documents, such as chunking based on token size, often fail to capture the intended context accurately. MedGraphRAG employs a mixed method of character separation and topic-based segmentation to improve context capture. This approach involves:

  • Character Separation: Using line break symbols to isolate individual paragraphs.
  • Semantic Chunking: Employing proposition transfer to transform paragraphs into self-sustaining statements, which are then analyzed sequentially to decide their inclusion in data chunks.

2. Graph Construction

MedGraphRAG constructs a three-tier hierarchical graph:

  • First Level: Entities are extracted from user-provided medical documents.
  • Second Level: These entities are linked to more basic entities derived from credible medical books and papers.
  • Third Level: Further connections are made to a fundamental medical dictionary graph, such as the Unified Medical Language System (UMLS), which provides detailed explanations and semantic relationships.

3. Information Retrieval

To address user queries, MedGraphRAG implements a U-retrieve strategy that combines top-down retrieval with bottom-up response generation. This involves:

  • Top-Down Retrieval: Structuring the query using predefined medical tags and indexing them through the graph layers.
  • Bottom-Up Generation: Generating responses from meta-graphs and summarizing the information into a detailed response.

Key Findings and Results

The efficacy of MedGraphRAG was demonstrated through rigorous testing on multiple medical Q&A benchmarks, including PubMedQA, MedMCQA, and USMLE. The results show that MedGraphRAG significantly enhances the performance of both open-source and closed-source LLMs, achieving state-of-the-art (SOTA) results.

Performance Boost with MedGraphRAG

Table 1 below highlights the improvement of various LLMs when enhanced with MedGraphRAG across different benchmarks.

Model
Size
Open-sourced
MedQA
MedMCQA
PubMedQA
LLaMA2
13B
yes
42.7
37.4
68.0
LLaMA2-MedGraphRAG
13B
yes
65.5
51.4
73.2
LLaMA2
70B
yes
43.7
35.0
74.3
LLaMA2-MedGraphRAG
70B
yes
69.2
58.7
76.0
LLaMA3
8B
yes
59.8
57.3
75.2
LLaMA3-MedGraphRAG
8B
yes
74.2
61.6
77.8
LLaMA3
70B
yes
72.1
65.5
77.5
LLaMA3-MedGraphRAG
70B
yes
88.4
79.1
83.8
Gemini-pro
-
no
59.0
54.8
69.8
Gemini-MedGraphRAG
-
no
72.6
62.0
76.2
GPT-4
-
no
81.7
72.4
75.2
GPT-4 MedGraphRAG
-
no
91.3
81.5
83.3
Human (expert)
-
-
87.0
90.0
78.0

Evidence-Based Responses

By linking responses to credible sources, MedGraphRAG ensures that each answer is verifiable, enhancing trustworthiness for clinicians.

For example, MedGraphRAG can accurately differentiate between similar conditions like Alzheimer's and Vascular Dementia, providing detailed explanations supported by authentic citations.

Comparison with SOTA Models

MedGraphRAG outperforms previous state-of-the-art models, including fine-tuned and non-fine-tuned models, on the MedQA benchmark.

When applied to powerful LLMs like GPT-4, MedGraphRAG achieves SOTA results, surpassing even human expert performance.

Conclusion

MedGraphRAG represents a significant advancement in the application of LLMs to the medical domain.

By integrating a graph-based RAG framework, it enhances the accuracy, safety, and interpretability of medical responses.

Future work aims to expand this framework to include more diverse datasets and explore its potential in real-time clinical settings.

Potential Applications

  • Clinical Decision Support: Providing clinicians with evidence-based recommendations.
  • Medical Education: Assisting in the training of medical students and professionals.
  • Research: Facilitating the synthesis of new medical insights from large datasets.

MedGraphRAG is a promising step towards safer and more reliable AI applications in medicine, ensuring that LLMs can be trusted with sensitive and critical information.

Read more