Install dependencies & Set environment variables
Connect to Phoenix
Note: if you’re self-hosting Phoenix, swap your collector endpoint variable in the snippet below, and remove the Phoenix Client Headers variable.Prepare trace dataset
For the sake of making this guide fully runnable, we’ll briefly generate some traces and track them in Phoenix. Typically, you would have already captured traces in Phoenix and would skip to “Download trace dataset from Phoenix”Download trace dataset from Phoenix
Generate evaluations
Now that we have our trace dataset, we can generate evaluations for each trace. Evaluations can be generated in many different ways. Ultimately, we want to end up with a set of labels and/or scores for our traces. You can generate evaluations using:- Plain code
- The Phoenix evals library, which supports both built-in and custom evaluators.
- Other evaluation packages
Code Eval Example
Let’s start with a simple example of generating evaluations using plain code. OpenAI has a habit of repeating jokes, so we’ll generate evaluations to label whether a joke is a repeat of a previous joke.Upload evaluations to Phoenix
Our evals_df has a column for the span_id and a column for the evaluation result. The span_id is what allows us to connect the evaluation to the correct trace in Phoenix. Phoenix will also automatically look for columns named “label” and “score” to display in the UI.LLM Eval Example
Let’s use the Phoenix Evals library to define an LLM-as-a-judge evaluator that classifies jokes as either “nerdy” or “not nerdy.”- If you’re interested in more complex evaluation and evaluators, start with how to use LLM as a Judge evaluators
- If you’re ready to start testing your application in a more rigorous manner, check out how to run structured experiments

