Context Precision

Context Precision evaluates how well a retrieval system ranks the relevant pieces/chunks of information compared to the ground truth.

This metric is calculated using the query, ground truth, and context. The scores range from 0 to 1, with higher scores showing better precision.

Formula:

\text{Precision@k} = { \text{true positive@k} \over \text{(true positive@k + false positive@k)} }

\text{Context Precision@K} = \frac{\sum_{k=1}^{K} \left( \text{Precision@k} \times v_k \right)}{\text{Total number of relevant items in the top } K \text{ results}}

k = retrieved chunks (or contexts) are relevant to the task

K = The total number of chunks in the retrieved contexts

Example:

Consider we have 3 different examples which are in list format:

Questions = [What is SpaceX?, Who found it?, What exactly does SpaceX do?]

Answers = [It is an American aerospace company], [SpaceX founded by Elon Musk], [SpaceX produces and operates the Falcon 9 and Falcon rockets]

Contexts = [SpaceX is an American aerospace company founded in 2002], [SpaceX, founded by Elon Musk, is worth nearly $210 billion], [The full form of SpaceX is Space Exploration Technologies Corporation]

Ground Truth = [SpaceX is an American aerospace company], [Founded by Elon Musk], [SpaceX produces and operates the Falcon 9 and Falcon Heavy rockets]

Solution:

For Question 1: The Ground Truth is relevant to the Context. So, this is a true positive (TP), and there are no false positives (FP). Therefore, the context precision here is 1.

\text{Precision@1} = { \text{1} \over \text{(1 + 0)} } = 1

Similarly, for Question 2, the context precision is 1.

But,

For Question 3: The Ground Truth is not relevant to the Context. Therefore, this is a false positive (FP), with no true positives (TP). Thus, the context precision here is 0.

\text{Precision@2} = { \text{0} \over \text{(0 + 1)} } = 0

For the average contextual precision K=3 we assume equal weights Vk = 1

\text{Contextual Precision@K} = { \text{(1 + 1 + 0) }\times 1 \over \text{3} } = 0.67

Code:

Context Precision using RAGAS:

from datasets import Dataset
from ragas.metrics import context_precision
from ragas import evaluate

data_samples = {
    'question': ['What is SpaceX?', 'Who found it?','What exactly does SpaceX do?' ],
    'prediction': ['It is an American aerospace company', 'SpaceX founded by Elon Musk','SpaceX produces and operates the Falcon 9 and Falcon rockets'],
    'contexts': [['SpaceX is an American aerospace company founded in 2002'], ['SpaceX, founded by Elon Musk, is worth nearly $210 billion'], 
     ['The full form of SpaceX is Space Exploration Technologies Corporation']],
    'ground_truth': ['SpaceX is an American aerospace company','Founded by Elon Musk','SpaceX produces and operates the Falcon 9 and Falcon Heavy rockets']
}

dataset = Dataset.from_dict(data_samples)
score = evaluate(dataset,metrics=[context_precision])
score.to_pandas()

Context Precision using Athina AI:

First, install the athina package:

pip install --upgrade athina

Then, set your API keys:

from athina.keys import AthinaApiKey, OpenAiApiKey

OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))

Finally, we can run evals like this:

from athina.loaders import Loader
from athina.evals import RagasContextPrecision

data = [
    {
        "query": "What is SpaceX?",
        "context": ['SpaceX is an American aerospace company founded in 2002'],
        "expected_response": "SpaceX is an American aerospace company"
    },
    {
        "query": "Who found it?",
        "context": ['SpaceX, founded by Elon Musk, is worth nearly $210 billion'],
        "expected_response": "Founded by Elon Musk."
    },
    {
        "query": "What exactly does SpaceX do?",
        "context": ['The full form of SpaceX is Space Exploration Technologies Corporation'],
        "expected_response": "SpaceX produces and operates the Falcon 9 and Falcon Heavy rockets"
    },
]

# Load the data from CSV, JSON, Athina or Dictionary
dataset = RagasLoader().load_dict(data)

eval_model = "gpt-3.5-turbo"
RagasContextPrecision(model=eval_model).run_batch(data=dataset).to_df()

← Introduction Context Recall →