@arizeai/phoenix-evals evaluators.
Relevant Source Files
src/experiments/runExperiment.tsfor the task execution flow and return shapesrc/experiments/helpers/getExperimentEvaluators.tsfor evaluator normalizationsrc/experiments/helpers/fromPhoenixLLMEvaluator.tsfor the phoenix-evals bridgesrc/experiments/getExperimentRuns.tsfor reading runs back after executionsrc/types/experiments.tsforEvaluatorParamsincludingtraceIdsrc/spans/getSpans.tsfor fetching spans by trace ID and span kind
Two Common Patterns
UseasExperimentEvaluator() when your evaluation logic is plain TypeScript.
Use @arizeai/phoenix-evals evaluators directly when you want model-backed judging.
Code-Based Example
If you just want to compare task output against a reference answer or apply deterministic checks, useasExperimentEvaluator():
- you already know the exact correctness rule
- you want fast, deterministic evaluation
- you do not want to call another model during evaluation
Model-Backed Example
If you want a model-backed experiment with automatic tracing and an LLM-as-a-judge evaluator, this is the core pattern:What This Example Shows
createOrGetDataset()creates or reuses the dataset the experiment will run againsttaskreceives the full dataset example objectgenerateText()emits traces that Phoenix can attach to the experiment when telemetry is enabledcreateClassificationEvaluator()from@arizeai/phoenix-evalscan be passed directly torunExperiment()runExperiment()records both task runs and evaluation runs in Phoenix
Task Inputs
runExperiment() calls your task with the full dataset example, not just example.input.
That means your task should usually read:
example.inputfor the task inputsexample.outputfor any reference answerexample.metadatafor additional context
example.input.question and example.input.context before generating a response.
Evaluator Inputs
When an evaluator runs, it receives a normalized object with these fields:| Field | Description |
|---|---|
input | The dataset example’s input object |
output | The task output for that run |
expected | The dataset example’s output object |
metadata | The dataset example’s metadata object |
traceId | The OpenTelemetry trace ID of the task run (optional, string | null) |
createClassificationEvaluator() prompt can reference {{input.question}} and {{output}}.
For code-based evaluators created with asExperimentEvaluator(), those same fields are available inside evaluate({ input, output, expected, metadata, traceId }).
Trace-Based Evaluation
Each task run captures an OpenTelemetry trace ID. Evaluators can usetraceId to fetch the task’s spans from Phoenix and evaluate the execution trajectory — for example, verifying that specific tool calls were made or inspecting intermediate steps.
This pattern works best with evaluateExperiment() as a separate step after runExperiment(), so that all task spans are ingested into Phoenix before the evaluator queries them.
- Use
setGlobalTracerProvider: trueonrunExperiment()so that child spans fromtraceToolor other OTel instrumentation land in the same trace as the task - Use
evaluateExperiment()as a separate step so spans are ingested before querying - Use
getSpans()withtraceIdsandspanKindfilters to fetch specific spans from the task trace traceIdisnullin dry-run mode since no real traces are recorded
What runExperiment() Returns
The returned object includes the experiment metadata plus the task and evaluation results from the run.
experiment.idis the experiment ID in Phoenixexperiment.projectNameis the Phoenix project that received the task tracesexperiment.runsis a map of run IDs to task run objectsexperiment.evaluationRunscontains evaluator results when evaluators are provided
Follow-Up Helpers
Use these exports for follow-up workflows:createExperimentgetExperimentgetExperimentInfogetExperimentRunslistExperimentsresumeExperimentresumeEvaluationdeleteExperiment
Tracing Behavior
runExperiment() can register a tracer provider for the task run so that task spans and evaluator spans show up in Phoenix during the experiment. This is why tasks that call the AI SDK can still emit traces to Phoenix when global tracing is enabled.
Source Map
src/experiments/runExperiment.tssrc/experiments/createExperiment.tssrc/experiments/getExperiment.tssrc/experiments/getExperimentRuns.tssrc/experiments/helpers/getExperimentEvaluators.tssrc/experiments/helpers/fromPhoenixLLMEvaluator.tssrc/experiments/helpers/asExperimentEvaluator.ts

