- not grounded in context
- repetitive, repetitive, repetitive
- grammatically incorrect
- excessively lengthy and characterized by an overabundance of words
- incoherent
How It Works
Here’s the step-by-step process for using an LLM as a judge:1
Identify Evaluation Criteria
First, determine what you want to evaluate, be it hallucination, toxicity, accuracy, or another characteristic. See our pre-built evaluators for examples of what can be assessed.
2
Craft Your Evaluation Prompt
Write a prompt template that will guide the evaluation. This template should clearly define what variables are needed from both the initial prompt and the LLM’s response to effectively assess the output.
3
Select an Evaluation LLM
Choose the most suitable LLM from our available options for conducting your specific evaluations.
4
Generate Evaluations and View Results
Execute the evaluations across your data. This process allows for comprehensive testing without the need for manual annotation, enabling you to iterate quickly and refine your LLM’s prompts.

