Evaluator Traces

Phoenix Evals automatically traces all evaluation executions, providing complete transparency into how your evaluators make decisions. This visibility is essential for achieving human alignment and building trust in your evaluation results.

Why Tracing Matters for Human Alignment

LLM evaluations are only as good as their alignment with human judgment. To achieve this alignment, you need to:

Inspect Evaluator Reasoning: See exactly how the evaluator LLM interpreted your prompt and reached its decision
Debug Evaluation Logic: Identify when evaluators misunderstand instructions or make inconsistent judgments
Validate Prompt Engineering: Verify that your evaluation prompts are working as intended across different examples
Build Confidence: Provide stakeholders with transparent evidence of evaluation quality

What Gets Traced

Every evaluation execution captures:

Input Data: The original content being evaluated
Evaluation Prompts: The exact prompts sent to evaluator LLMs
Model Responses: Full reasoning and decision-making process
Final Scores: Structured evaluation results and metadata
Execution Details: Timing, retries, and performance metrics

Transparency by Design

Phoenix Evals follows the Transparency pillar - nothing is abstracted away. You can inspect every aspect of the evaluation process, from the raw prompts to the model’s step-by-step reasoning. This transparency enables you to:

Tune evaluation prompts for better human alignment
Identify systematic biases or errors in evaluation logic
Provide evidence-based justification for evaluation results
Continuously improve evaluator performance through data-driven insights

Use Phoenix’s trace viewer to explore evaluation traces and ensure your evaluators are making decisions that align with human judgment.

Tracing

Prompt Engineering

Datasets & Experiments

Evaluation

Settings

Resources

Evaluator Traces

Why Tracing Matters for Human Alignment

What Gets Traced

Transparency by Design

Tracing

Prompt Engineering

Datasets & Experiments

Evaluation

Settings

Resources

​Why Tracing Matters for Human Alignment

​What Gets Traced

​Transparency by Design

Why Tracing Matters for Human Alignment

What Gets Traced

Transparency by Design