Before running evals, make sure Phoenix is running & you have sent traces in your project. For more step by step instructions, check out this Get Started guide & Get Started with Tracing guide.
- Phoenix Cloud
- Local (Self-hosted)
Log in, create a space, navigate to the settings page in your space, and create your API keys.Set your environment variables.You can find your collector endpoint here:
Your Collector Endpoint is: https://app.phoenix.arize.com/s/ + your space name.

Since, we are running our evaluations on our trace data from our first project, we’ll need to pull that data into our code.
In this example, we will define, create, and run our own evaluator. There’s a number of different evaluators you can run, but this quick start will go through an LLM as a Judge Model.1) Define your LLM Judge ModelWe’ll use OpenAI as our evaluation model for this example, but Phoenix also supports a number of other models.If you haven’t yet defined your OpenAI API Key from the previous step, let’s first add it to our environment.2) Define your EvaluatorsWe will set up a Q&A correctness Evaluator with the LLM of choice. I want to first define my LLM-as-a-Judge prompt template. Most LLM-as-a-judge evaluations can be framed as a classification task where the output is one of two or more categorical labels.Now we want to define our Classification Evaluator

