Skip to main content
07.09.2025

Baseline for Experiment Comparisons

You can now set a baseline run when comparing multiple experiments. This is especially useful when one run represents a known-good output (e.g. a previous model version or a CI-approved run), and you want to evaluate changes relative to it.For example, in an evaluation like accuracy, you can easily see where the value flipped from correct โ†’ incorrect or incorrect โ†’ correct between your baseline and the current comparison - helping you quickly spot regressions or improvements.This feature makes it easier to isolate the impact of changes like a new prompt, model, or dataset.

feat(experiments): add baseline to compare experiments page by axiomofjoy ยท Pull Request #8461 ยท Arize-ai/phoenix

GitHub