1
Launch Phoenix
Before running evals, make sure Phoenix is running & you have sent traces in your project. For more step by step instructions, check out this Get Started guide & Get Started with Tracing guide.
- Phoenix Cloud
- Local (Self-hosted)
Before sending traces, make sure Phoenix is running. For more step by step instructions, check out this Get Started guide.
- Phoenix Cloud
Log in, create a space, navigate to the settings page in your space, and create your API keys.In your code, set your environment variables.You can find your collector endpoint here:
Your Collector Endpoint is: https://app.phoenix.arize.com/s/ + your space name.

2
Install Phoenix Evals
You’ll need to install the evals library that’s apart of Phoenix. For the most recent version, run a version above 2.0.
3
Pull down your Trace Data
Since, we are running our evaluations on our trace data from our first project, we’ll need to pull that data into our code.
4
Set Up Evaluations
In this example, we will define, create, and run our own evaluator. There’s a number of different evaluators you can run, but this quick start will go through an LLM as a Judge Model.Now we want to define our Classification Evaluator
1) Define your LLM Judge Model
We’ll use OpenAI as our evaluation model for this example, but Phoenix also supports a number of other models.If you haven’t yet defined your OpenAI API Key from the previous step, let’s first add it to our environment.2) Define your Evaluators
We will set up a Q&A correctness Evaluator with the LLM of choice. I want to first define my LLM-as-a-Judge prompt template. Most LLM-as-a-judge evaluations can be framed as a classification task where the output is one of two or more categorical labels.5
Run Evaluation
Now that we have defined our evaluator, we’re ready to evaluate our traces.
6
Log results to Visualize in Phoenix
You’ll now be able to log your evaluations in your project view.

