Skip to main content

Releases · Arize-ai/phoenix

GitHub
01.31.2026

01.31.2026: Tool Selection and Tool Invocation Evaluators

Available in arize-phoenix-evals 0.16.0+ (Python) and @arizeai/phoenix-evals 1.3.0+ (TypeScript)Phoenix now provides two specialized evaluators for assessing AI agent tool usage. The Tool Selection Evaluator judges whether an agent correctly chose the most appropriate tool from its available toolkit to answer a user’s question, without evaluating the parameters passed. The Tool Invocation Evaluator assesses whether the agent correctly invoked a tool with proper parameters, JSON formatting, and safe values.These evaluators help developers:
  • Identify tool selection errors where agents choose suboptimal or incorrect tools
  • Debug parameter issues including hallucinated fields, malformed JSON, and incorrect values
  • Improve tool descriptions and agent prompts based on systematic evaluation
  • Validate multi-tool and multi-turn interactions across complex agent workflows
Both evaluators are available as ToolSelectionEvaluator and ToolInvocationEvaluator in Python’s phoenix.evals.metrics module, and as createToolSelectionEvaluator and createToolInvocationEvaluator in TypeScript.
01.28.2026

01.28.2026: Configurable Email Extraction for OAuth2 Providers

Available in Phoenix 12.33.1+Phoenix now supports custom email extraction from OAuth2 identity providers through the PHOENIX_OAUTH2_{IDP}_EMAIL_ATTRIBUTE_PATH environment variable. This solves authentication issues with providers like Azure AD/Entra ID where the standard email claim may be null but alternative claims like preferred_username contain the user’s identity.Configure email extraction using JMESPath expressions:
</Update>

<Update label="02.01.2026">
## [02.01.2026: Extract from Azure AD preferred_username claim](/docs/phoenix/release-notes/02-2026/02-01-2026-tool-selection-and-tool-invocation-evaluators)
PHOENIX_OAUTH2_AZURE_AD_EMAIL_ATTRIBUTE_PATH=preferred_username
</Update>

<Update label="02.01.2026">
## [02.01.2026: Extract from nested claims](/docs/phoenix/release-notes/02-2026/02-01-2026-tool-selection-and-tool-invocation-evaluators)
PHOENIX_OAUTH2_CUSTOM_IDP_EMAIL_ATTRIBUTE_PATH=user.contact.email
The default behavior remains unchanged, using the standard OIDC email claim when no custom path is specified. JMESPath expressions are validated at startup for immediate feedback on configuration errors.
01.22.2026

01.22.2026: CLI Commands for Prompts, Datasets, and Experiments

Available in @arizeai/phoenix-cli 0.4.0+The Phoenix CLI now provides comprehensive commands for managing prompts, datasets, and experiments directly from your terminal. Access version-controlled prompts, create evaluation datasets, and run experiments—all without leaving your development environment.Prompt Management:
  • List and view prompts with px prompts and px prompt <name>
  • Pipe prompts to AI assistants for optimization and analysis
  • Text format output with XML-style role tags for LLM consumption
Dataset Operations:
  • Create and manage datasets with px datasets and px dataset <name>
  • Add examples and query dataset contents
  • Export datasets for offline analysis
Experiment Workflows:
  • Run experiments and compare results across configurations
  • View experiment details and performance metrics
  • Track changes across prompt and model variations
These commands integrate seamlessly with AI coding assistants and enable systematic testing of LLM applications through terminal-based workflows.
01.23.2026

01.23.2026: CLI Authentication Configuration

Available in @arizeai/phoenix-cli 0.4.0+The Phoenix CLI now includes enhanced authentication configuration commands, resolving database race conditions and improving connection reliability. Users can configure authentication settings directly through the CLI for more predictable and stable connections to Phoenix servers.
01.21.2026

01.21.2026: Create Datasets from Traces with Span Associations

Available in arize-phoenix-client 1.28.0+ (Python) and @arizeai/phoenix-client 2.0.0+ (TypeScript)Phoenix now enables converting production traces into curated datasets while preserving bidirectional links back to source spans. Use the new span_id_key parameter to maintain traceability from evaluation examples to their original production executions.Python Example:
from phoenix.client import Client

client = Client()
dataset = client.datasets.create_dataset(
    name="production-queries",
    dataframe=spans_df,
    input_keys=["input"],
    output_keys=["output"],
    span_id_key="context.span_id"  # Links examples to spans
)
TypeScript Example:
import { createClient } from '@arizeai/phoenix-client';

const client = createClient();
await client.createDataset({
    name: "production-queries",
    examples: examples.map(ex => ({
        input: ex.input,
        output: ex.output,
        spanId: ex.spanId  // Preserves trace links
    }))
});
Key capabilities:
  • Batch resolution of span IDs for optimal performance
  • Graceful fallback when span IDs are missing or invalid
  • Backwards compatible with existing dataset creation workflows
  • Bidirectional navigation between evaluation results and production traces
01.19.2026

01.19.2026: Export Annotations with Traces

Available in @arizeai/phoenix-cli 0.3.0+The Phoenix CLI now supports exporting annotations alongside traces using the --include-annotations flag. Annotations—including manual labels, LLM evaluation scores, and programmatic feedback—are now preserved when exporting traces for offline analysis, backup, or migration workflows.
px traces export --include-annotations > traces_with_feedback.jsonl
This enables teams to maintain complete evaluation history when moving data between environments or conducting retrospective analysis of model performance.
01.22.2026

01.22.2026: CLI Prompt Commands: Pipe Prompts to AI Assistants 📝

Available in @arizeai/phoenix-cli 0.4.0+Phoenix CLI now supports prompt introspection with px prompts and px prompt. List prompts, view their content, and pipe them directly to AI assistants like Claude Code for optimization suggestions. The --format text option outputs prompts with XML-style role tags, ideal for analysis workflows.
01.21.2026

01.21.2026: Create Datasets from Traces with Span Associations 🔗

Available in arize-phoenix-client 1.28.0+ (Python) and @arizeai/phoenix-client 2.0.0+ (TypeScript)The Phoenix client now enables converting production traces into curated datasets while preserving associations back to source spans. Query spans using client methods, then create datasets with span associations to maintain bidirectional links. Use this to build golden datasets from validated interactions, curate edge cases from failed traces, or create regression test suites from critical user flows.
01.21.2026

01.21.2026: Phoenix CLI: Datasets, Experiments & Annotations 🧪

Available in @arizeai/phoenix-cli 0.2.0+The Phoenix CLI now supports datasets, experiments, and annotations. Pull evaluation data, export experiment results, and access human feedback directly from the terminal. Works well with AI coding assistants for analyzing test cases and reviewing results.
01.17.2026

01.17.2026: Phoenix CLI: Terminal Access for AI Coding Assistants 🖥️

Available in @arizeai/phoenix-cli 0.1.0+AI coding assistants operate through terminals and files—they run shell commands, read output, and process data. The new Phoenix CLI makes trace data accessible through these interfaces, enabling tools like Claude Code, Cursor, and Windsurf to query your Phoenix instance directly. Export traces to JSON, pipe to jq, or save to disk for analysis.
12.20.2025

12.20.2025: Improved User Preferences ⚙️

Available in Phoenix 12.27+Phoenix now offers enhanced user preference settings, giving you more control over your experience. This update includes theme selection in viewer preferences and programming language preference.
12.12.2025

12.12.2025: Support for Gemini Tool Calls 🤖

Available in Phoenix 12.25+Phoenix now supports Gemini tool calls, enabling enhanced integration capabilities with Google’s Gemini models. This update allows for more robust and feature-complete interactions with Gemini, including improved request/response translation and advanced conversation handling with tool calls.
12.09.2025

12.09.2025: Span Notes API 📝

Available in Phoenix 12.21+New dedicated endpoints for span notes enable open coding and seamless annotation integrations. Add notes to spans programmatically using the Phoenix client in both Python and TypeScript—perfect for debugging sessions, human feedback, and building custom annotation pipelines.
12.06.2025

12.06.2025: LDAP Authentication Support 🔐

Available in Phoenix 12.20+Phoenix now supports authentication against LDAP directories, enabling integration with enterprise identity infrastructure including Microsoft Active Directory, OpenLDAP, and any LDAP v3 compliant directory. Key features include group-based role mapping, multi-server failover, TLS encryption, and automatic user provisioning.
12.04.2025

12.04.2025: Evaluator Message Formats 💬

Available in phoenix-evals 0.22+ (Python) and @arizeai/phoenix-evals 2.0+ (TypeScript)Phoenix evaluators now support flexible prompt formats including simple string templates and OpenAI-style message arrays for multi-turn prompts. Python supports both f-string and mustache syntax, while TypeScript uses mustache syntax. Adapters handle provider-specific transformations automatically.
12.03.2025

12.03.2025: TypeScript createEvaluator 🧪

Available in @arizeai/phoenix-evals 2.0+The createEvaluator utility provides a type-safe way to build custom code evaluators for experiments in TypeScript. Define evaluators with full type inference, access input, output, expected, and metadata parameters, and integrate seamlessly with runExperiment.
12.01.2025

12.01.2025: Splits on Experiments Table 📊

Available in Phoenix 12.20+You can now view and filter experiment results by data splits directly in the experiments table. This enhancement makes it easier to analyze performance across different data subsets (such as train, validation, and test) and compare how your models perform on each split.
11.29.2025

11.29.2025: Add support for Claude Opus 4-5 🤖

Available in Phoenix 12.18+
Claude Opus 4-5 support
Phoenix now supports Claude Opus 4 and 4-5 as models you can invoke from the Playground.
11.27.2025

11.27.2025: Show Server Credential Setup in Playground API Keys 🔐

Available in Phoenix 12.18+
Playground server credential setup
The Playground now clearly indicates when server credentials are configured.
11.25.2025

11.25.2025: Split Assignments When Uploading a Dataset 🗂️

Available in Phoenix 12.18+
You can now assign data splits (ex: train/test/validation) directly when uploading a dataset into Arize Phoenix.
11.23.2025

11.23.2025: Repetitions for Manual Playground Invocations 🛝

Available in Phoenix 12.17+
Playground repetitions feature
This update adds an easy way to run several repetitions of the same prompt directly from the Playground.
11.14.2025

11.14.2025: Expanded Provider Support with OpenAI 5.1 + Gemini 3 🔧

Available in Phoenix 12.15+This update enhances LLM provider support by adding OpenAI v5.1 compatibility (including reasoning capabilities), expanding support for Google DeepMind/Gemini models, and introducing the gemini-3 model variant.
11.12.2025

11.12.2025: Updated Anthropic Model List 🧠

Available in Phoenix 12.15+This update enhances the Anthropic model registrations in Arize Phoenix by adding support for the 4.5 Sonnet/Haiku variants and removing several legacy 3.x Sonnet/Opus entries.
11.09.2025

11.09.2025: OpenInference TypeScript 2.0 💻

traced agent
  • Added easy manual instrumentation with the same decorators, wrappers, and attribute helpers found in the Python openinference-instrumentation package.
  • Introduced function tracing utilities that automatically create spans for sync/async function execution, including specialized wrappers for chains, agents, and tools.
  • Added decorator-based method tracing, enabling automatic span creation on class methods via the @observe decorator.
  • Expanded attribute helper utilities for standardized OpenTelemetry metadata creation, including helpers for inputs/outputs, LLM operations, embeddings, retrievers, and tool definitions.
  • Overall, tracing workflows, agent behavior, and external tool calls is now significantly simpler and more consistent across languages.
11.07.2025

11.07.2025: Timezone Preference 🌍

Available in Phoenix 12.11+
timezone preferences
This update adds a new display timezone preference feature for users: you can now specify how timestamps are shown across the UI, making time-based data more intuitive and aligned with your locale.
11.05.2025

11.05.2025: Metadata for Prompts 🗂️

Available in Phoenix 12.10+
metadata for prompts
Added full prompt-level metadata support across API, UI, and clients: you can now create, clone, patch, and display a JSON metadata field for prompts.
11.03.2025

11.03.2025: Playground Dataset Label Display 🏷️

Available in Phoenix 12.10+
dataset labels in playground
You can now view dataset labels as you load datasets into the Playground. This enhancement makes it easier to identify and select your desired dataset.
11.01.2025

11.01.2025: Resume Experiments and Evaluations 🔄

Available in Phoenix 12.10+This release allows you to resume your experiments and evaluations at your convenience. If certain examples fail, there is no need to repeat an entire task you already completed. This feature provides you with new management capabilities across servers and clients. It’s designed to save effort, making your experimentation workflow more flexible.
10.30.2025

10.30.2025: Metadata Support for Experiment Run Annotations 🧩

Available in Phoenix 12.9+
metadata support for experiment run annotations
Added metadata support for experiment run annotations, with GraphQL updates to fetch and expose this information. The annotation details view now displays formatted JSON metadata across both compare and example views for easier inspection and debugging.
10.28.2025

10.28.2025: Enable AWS IAM Auth for DB Configuration 🔐

Available in Phoenix 12.9+Added support for AWS IAM–based authentication for PostgreSQL connections to AWS Aurora and RDS. This enhancement enables the use of short-lived IAM tokens instead of static passwords, improving security and compliance for database access.
10.26.2025

10.26.2025: Add Split Edit Menu to Examples

Available in Phoenix 12.8+
add splits to example
Added a new “Split” dropdown to single-example view on the dataset pages, allowing users to update the data split classification (e.g., train/validation/test) directly from the example level. This improvement makes it easier to correct or adjust split assignments dynamically.
10.24.2025

10.24.2025: Filter Prompts Page by Label 🏷️

Available in Phoenix 12.7+
filter prompts by label
Added filtering by label on the Prompts page—users can now pick one or more labels to narrow the prompts list.
10.20.2025

10.20.2025: Splits

Available in Phoenix 12.7+In Arize Phoenix, splits let you categorize your dataset into distinct subsets—such as train, validation, or test—enabling structured workflows for experiments and evaluations. This capability offers more flexibility in how you organize, filter, and compare your data across different stages or experimental conditions.
10.18.2025

10.18.2025: Filter Annotations in Compare Experiments Slideover ✍️

Available in Phoenix 12.7+
filter annotations in compare experiments
Added filtering of annotations in the experiment compare slideover so that only annotations present on the selected experiment runs are displayed. This ensures a cleaner UI and avoids filters for annotations that don’t appear in the comparison set.
10.15.2025

10.15.2025: Enhanced Filtering for Examples Table 🔍

Available in Phoenix 12.5+
enhanced filtering for examples table
Added filtering capabilities to the Dataset Examples table, allowing users to search examples by text or split ID. Additionally, the split-management filter menu has been reorganized to separate filtering by splits from split management actions.
10.13.2025

10.13.2025: View Traces in Compare Experiments 🧪

Available in Phoenix 12.5+
view traces in compare experiments
We’ve added trace-links to the experiment compare slideover for runs and annotations. Clicking the new trace icons opens the Trace View.
10.10.2025

10.10.2025: Viewer Role 👀

Available in Phoenix 12.5+Introduced a new VIEWER role with enforced read-only permissions across both GraphQL and REST APIs, improving access control and security.
10.08.2025

10.08.2025: Dataset Labels 🏷️

Available in Phoenix 12.3+
dataset labels
Added support for dataset labels — you can now label datasets and view these labels in a dedicated column on the dataset list page, making it easier to filter and group datasets. All dataset labels can also be managed and viewed in the “Datasets” tab on the Settings page.
10.06.2025

10.06.2025: Paginate Compare Experiments 📃

Available in Phoenix 12.3+
paginate compare experiments
We added pagination to the experiment comparison slideover on the list page for smoother navigation through results. We also introduced a new repetition number column, visible only when the base experiment includes multiple repetitions.
10.05.2025

10.05.2025: Load Prompt by Tag into Playground 🛝

Available in Phoenix 12.2+
prompt tags in playground
We have added support for selecting and loading prompts by tag in the Playground. Users can now open specific prompts tagged for easier comparison and reproducibility.
10.03.2025

10.03.2025: Prompt Version Editing in Playground 🛝

Available in Phoenix 12.2+
prompt versions in playground
We added support for prompt versioning in the Playground — users can now select, edit, and experiment with specific prompt versions directly. This update improves traceability and reproducibility for prompt iterations, making it easier to manage and compare different versions.
09.29.2025

09.29.2025: Day 0 support for Claude Sonnet 4.5

Available in Phoenix 12.1+
Day-0 support for Claude Sonnet 4.5.
09.27.2025

09.27.2025: Dataset Splits 📊

Available in Phoenix 12.0+Add support for custom dataset splits to organize examples by category.
09.26.2025

09.26.2025: Session Annotations 🗂️

Available in Phoenix 12.0+
You can now annotate sessions with conversational evaluations like coherency and tone.
09.25.2025

09.25.2025: Repetitions 🔁

Available in Phoenix 11.38+
Support for repetitions is now enabled in Playground and SDK workflows.
09.24.2025

09.24.2025: Custom HTTP headers for requests in Playground 🛠️

Available in Phoenix 11.36+
Enable configuring custom HTTP headers for playground requests.
09.23.2025

09.23.2025: Repetitions in experiment compare slideover 🔄

Available in Phoenix 11.36+
Show experiment repetitions as separate cards in the compare slideover 🔄
09.17.2025

09.17.2025: Experiment compare details slideover in list view 🔍

Available in Phoenix 11.34+
Added a slideover in the experiments list view to show compare details inline.
09.15.2025

09.15.2025: Prompt Labels 🏷️

Available in Phoenix 11.33+
We’ve added support for labeling prompts so you can categorize them by use-case, provider, or any custom tag.
09.12.2025

09.12.2025: Enable Paging in Experiment Compare Details 📄

Available in Phoenix 11.33+
We’ve added paging functionality to the Experiment Compare details slide-over view, allowing users to navigate between individual examples using arrow buttons or keyboard shortcuts (J / K). Pagination
09.08.2025

09.08.2025: Experiment Annotation Popover in Detail View 🔍

Available in Phoenix 11.33+
Added an annotation popover in the experiment detail view to reveal full annotation content without leaving the page.
09.04.2025

09.04.2025: Experiment Lists Page Frontend Enhancements 💻

Available in Phoenix 11.32+
In this update, the Experiment Lists page has received several user-facing enhancements to improve usability and responsiveness.
09.03.2025

09.03.2025: Add Methods to Log Document Annotations 📜

Available in Phoenix 11.31+Added client-side support for logging document annotations with a new log_document_annotations(...) method, supporting both sync and async API calls.
08.28.2025

08.28.2025: New arize-phoenix-client Package 📦

arize-phoenix-client is a lightweight, fully-featured package for interacting with Phoenix. It lets you manage datasets, experiments, prompts, spans, annotations, and projects - without needing a local Phoenix installation.
08.22.2025

08.22.2025: New Trace Timeline View 🔭

Available in Phoenix 11.26+
Easily spot timing bottlenecks with the new trace timeline visualization.
08.20.2025

08.20.2025: New Experiment and Annotation Quick Filters 🏎️

Available in Phoenix 11.25+
Quick filters in experiment views let you drill down by eval scores and labels to quickly spot regressions and outliers.
08.14.2025

08.14.2025: Trace Transfer for Long-Term Storage 📦

Available in Phoenix 11.23+
Transfer traces across projects for long-term storage while preserving annotations, dataset links, and full context.
08.12.2025

08.12.2025: UI Design Overhauls 🎨

Available in Phoenix 11.22+
The platform now features refreshed design elements including expandable navigation, an “Action” bar, and dynamic color contrast for clearer and more intuitive workflows.
08.07.2025

08.07.2025: Improved Error Handling in Prompt Playground ⚠️

Available in Phoenix 11.20+
Prompt Playground experiments now provide clearer error messages, listing valid options when an input is invalid.
08.06.2025

08.06.2025: Expanded Search Capabilities 🔍

Available in Phoenix 11.19+
Search functionality has been enhanced across the platform. Users can now search projects, prompts, and datasets, making it easier to quickly find and access the resources they need.
08.05.2025

08.05.2025: Claude Opus 4-1 Support 🤖

Available in Phoenix 11.19+
Support for Claude Opus 4-1 is now available, enabling teams to begin experimenting and evaluating with the new model from day 0.
08.04.2025

08.04.2025: Manual Project Creation & Trace Duplication 📂

Available in Phoenix 11.19+
You can now create projects manually in the UI and duplicate traces into other projects via the SDK, making it easier to organize evaluation data and streamline workflows.
08.03.2025

08.03.2025: Delete Spans via REST API 🧹

Available in Phoenix 11.18+You can now delete spans using the REST API, enabling efficient data redaction and giving teams greater control over trace data.

See more

2026

2025