Context Precision evaluates how well a retrieval system ranks the relevant pieces/chunks of information compared to the ground truth.
This metric is calculated using the query, ground truth, and context. The scores range from 0 to 1, with higher scores showing better precision.
Formula:
k = retrieved chunks (or contexts) are relevant to the task
K = The total number of chunks in the retrieved contexts
Example:
Consider we have 3 different examples which are in list format:
Questions = [What is SpaceX?, Who found it?, What exactly does SpaceX do?]
Answers = [It is an American aerospace company], [SpaceX founded by Elon Musk], [SpaceX produces and operates the Falcon 9 and Falcon rockets]
Contexts = [SpaceX is an American aerospace company founded in 2002], [SpaceX, founded by Elon Musk, is worth nearly $210 billion], [The full form of SpaceX is Space Exploration Technologies Corporation]
Ground Truth = [SpaceX is an American aerospace company], [Founded by Elon Musk], [SpaceX produces and operates the Falcon 9 and Falcon Heavy rockets]
Solution:
For Question 1: The Ground Truth is relevant to the Context. So, this is a true positive (TP), and there are no false positives (FP). Therefore, the context precision here is 1.
Similarly, for Question 2, the context precision is 1.
But,
For Question 3: The Ground Truth is not relevant to the Context. Therefore, this is a false positive (FP), with no true positives (TP). Thus, the context precision here is 0.
For the average contextual precision K=3 we assume equal weights Vk = 1
Code:
Context Precision using RAGAS:
from datasets import Dataset
from ragas.metrics import context_precision
from ragas import evaluate
data_samples = {
'question': ['What is SpaceX?', 'Who found it?','What exactly does SpaceX do?' ],
'prediction': ['It is an American aerospace company', 'SpaceX founded by Elon Musk','SpaceX produces and operates the Falcon 9 and Falcon rockets'],
'contexts': [['SpaceX is an American aerospace company founded in 2002'], ['SpaceX, founded by Elon Musk, is worth nearly $210 billion'],
['The full form of SpaceX is Space Exploration Technologies Corporation']],
'ground_truth': ['SpaceX is an American aerospace company','Founded by Elon Musk','SpaceX produces and operates the Falcon 9 and Falcon Heavy rockets']
}
dataset = Dataset.from_dict(data_samples)
score = evaluate(dataset,metrics=[context_precision])
score.to_pandas()
Context Precision using Athina AI:
First, install the athina
package:
pip install --upgrade athina
Then, set your API keys:
from athina.keys import AthinaApiKey, OpenAiApiKey
OpenAiApiKey.set_key(os.getenv('OPENAI_API_KEY'))
AthinaApiKey.set_key(os.getenv('ATHINA_API_KEY'))
Finally, we can run evals like this:
from athina.loaders import Loader
from athina.evals import RagasContextPrecision
data = [
{
"query": "What is SpaceX?",
"context": ['SpaceX is an American aerospace company founded in 2002'],
"expected_response": "SpaceX is an American aerospace company"
},
{
"query": "Who found it?",
"context": ['SpaceX, founded by Elon Musk, is worth nearly $210 billion'],
"expected_response": "Founded by Elon Musk."
},
{
"query": "What exactly does SpaceX do?",
"context": ['The full form of SpaceX is Space Exploration Technologies Corporation'],
"expected_response": "SpaceX produces and operates the Falcon 9 and Falcon Heavy rockets"
},
]
# Load the data from CSV, JSON, Athina or Dictionary
dataset = RagasLoader().load_dict(data)
eval_model = "gpt-3.5-turbo"
RagasContextPrecision(model=eval_model).run_batch(data=dataset).to_df()