All evals templates are tested against golden data that are available as part of the LLM eval library’s benchmarked data and target precision at 70-90% and F1 at 70-85%.
1. Hallucination Eval
Hallucinations on answers to public and private data
Tested on:
Hallucination QA Dataset, Hallucination RAG Dataset

