1
Set environment variables to connect to your Phoenix instance:
2
You’ll need to install the evals library that’s apart of Phoenix.
- Python
- TypeScript
3
Since, we are running our evaluations on our trace data from our first project, we’ll need to pull that data into our code.
- Python
- TypeScript
4
In this example, we will define, create, and run our own evaluator. There’s a number of different evaluators you can run, but this quick start will go through an LLM as a Judge Model.1) Define your LLM Judge ModelWe’ll use OpenAI as our evaluation model for this example, but Phoenix supports virtually any model.Make sure your 2) Define your EvaluatorsWe will set up a Q&A correctness Evaluator with the LLM of choice. I want to first define my LLM-as-a-Judge prompt template. Most LLM-as-a-judge evaluations can be framed as a classification task where the output is one of two or more categorical labels.3) Create your Classification Evaluator
OPENAI_API_KEY environment variable is set, then create the LLM judge:- Python
- TypeScript
- Python
- TypeScript
- Python
- TypeScript
5
Now that we have defined our evaluator, we’re ready to evaluate our traces.
- Python
6
You’ll now be able to log your evaluations in your project view.
- Python
- TypeScript
Next Steps
LLM as a Judge
Learn how LLM-based evaluation works and best practices
Pre-built Evaluators
Use pre-tested evaluators for hallucinations, relevance, toxicity, and more
Custom Evaluators
Build custom evaluators tailored to your use case
Datasets & Experiments
Run evaluations systematically with datasets and experiments

