Answer Relevancy is calculated as the average cosine similarity between the original question and a set of generated questions, where the generated questions are created using the answer from the RAG model as a reference.
Lower scores are assigned to incomplete or irrelevant answers, and higher scores indicate better relevancy.
It is calculated as the average cosine similarity between the original question and a set of artificially generated questions. These generated questions are created by reverse-engineering based on the answer.
Formula:
Where:
- Ag = embedding of the generated questionÂ
- Ao = embedding of the original question.
- NÂ = number of generated questions.
Example:
Questions = What is SpaceX?
Answers = It is an American aerospace company founded in 2002.
Solution:
To calculate the relevance of an answer to the given question, the process involves the following steps:
Step 1:Â Reverse-engineer ânâ variations of the question from the provided answer using a Large Language Model (LLM). This involves generating alternative samples of the original question based on the information contained in the answer.
For examples:
- Question 1:Â âWhich company was founded in 2002 and operates in aerospace?â
- Question 2:Â âWhat American company in aerospace was established in 2002?â
- Question 3:Â âWhich U.S. company focused on aerospace was started in 2002?â
Step 2:Â After that answer relevancy metric calculates the mean cosine similarity between the generated questions and the original question.
Code:
Answer Relevancy using RAGAS:
from datasets import Dataset
from ragas.metrics import answer_relevancy
from ragas import evaluate
data_samples = {
'question': ['What is SpaceX?', 'Who found it?','What exactly does SpaceX do?'],
'answer': ['It is an American aerospace company', 'SpaceX founded by Elon Musk','SpaceX produces and operates the Falcon 9 and Falcon rockets'],
'contexts' : [['SpaceX is an American aerospace company founded in 2002'],
['SpaceX, founded by Elon Musk, is worth nearly $210 billion'], ['The full form of SpaceX is Space Exploration Technologies Corporation']],
}
dataset = Dataset.from_dict(data_samples)
score = evaluate(dataset,metrics=[answer_relevancy])
score.to_pandas()
Answer Relevancy using Athina AI:
from athina.evals import RagasAnswerRelevancy
data = [
{
"query": "What is SpaceX?",
"context": ['SpaceX is an American aerospace company founded in 2002'],
"response": "It is an American aerospace company"
},
{
"query": "Who found it?",
"context": ['SpaceX, founded by Elon Musk, is worth nearly $210 billion'],
"response": "SpaceX founded by Elon Musk"
},
{
"query": "What exactly does SpaceX do?",
"context": ['The full form of SpaceX is Space Exploration Technologies Corporation'],
"response": "SpaceX produces and operates the Falcon 9 and Falcon rockets"
},
]
dataset = Loader().load_dict(data)
eval_model = "gpt-3.5-turbo"
RagasAnswerRelevancy(model=eval_model).run_batch(data=dataset).to_df()