Skip to main content

phoenix/js/packages/phoenix-cli at main · Arize-ai/phoenix

GitHub
Phoenix CLI is a command-line interface for your Phoenix projects. Fetch traces, list datasets, export experiment results, and access prompts directly from your terminal—or pipe them into AI coding agents like Claude Code, Cursor, Codex, and Gemini CLI. You can use Phoenix CLI for:
  • Immediate Debugging: Fetch the most recent trace of a failed or unexpected run with a single command
  • Bulk Export: Export large numbers of traces or experiment results to JSON files for offline analysis
  • Dataset & Experiment Access: List datasets and retrieve full experiment data including runs, evaluations, and trace IDs
  • Prompt Introspection: View and export prompt templates for analysis, optimization, or use with other tools
  • Terminal Workflows: Integrate trace and experiment data into your existing tools, piping output to Unix utilities like jq
  • AI Coding Assistants: Use with Claude Code, Cursor, Windsurf, or other AI-powered tools to analyze traces, experiments, and optimize prompts
Don’t see a use-case covered? @arizeai/phoenix-cli is open-source! Issues and PRs welcome.

Installation

npm install -g @arizeai/phoenix-cli
Or run directly with npx:
npx @arizeai/phoenix-cli

Quick Start

# Configure your Phoenix instance
export PHOENIX_HOST=http://localhost:6006
export PHOENIX_PROJECT=my-project
export PHOENIX_API_KEY=your-api-key  # if authentication is enabled

# Fetch the most recent trace
px trace list --limit 1

# Fetch a specific trace by ID
px trace get abc123def456

# Fetch recent LLM spans
px span list --span-kind LLM --limit 10

# Export traces to a directory
px trace list ./my-traces --limit 50

Environment Variables

VariableDescription
PHOENIX_HOSTPhoenix API endpoint (e.g., http://localhost:6006)
PHOENIX_PROJECTProject name or ID
PHOENIX_API_KEYAPI key for authentication (if required)
PHOENIX_CLIENT_HEADERSCustom headers as JSON string
CLI flags take priority over environment variables.

Commands

px project list

List all available projects.
px project list
px project list --format raw  # JSON output for piping
OptionDescriptionDefault
--endpoint <url>Phoenix API endpointFrom env
--api-key <key>Phoenix API keyFrom env
--format <format>Output format: pretty, json, or rawpretty
--no-progressDisable progress indicators
--limit <number>Maximum projects to fetch per page100

px trace list [directory]

Fetch recent traces from the configured project.
px trace list --limit 10                          # Output to stdout
px trace list ./my-traces --limit 10              # Save to directory
px trace list --last-n-minutes 60 --limit 20      # Filter by time
px trace list --since 2026-01-13T10:00:00Z        # Since timestamp
px trace list --format raw --no-progress | jq     # Pipe to jq
OptionDescriptionDefault
[directory]Save traces as JSON files to directorystdout
-n, --limit <number>Number of traces to fetch (newest first)10
--last-n-minutes <number>Only fetch traces from the last N minutes
--since <timestamp>Fetch traces since ISO timestamp
--endpoint <url>Phoenix API endpointFrom env
--project <name>Project name or IDFrom env
--api-key <key>Phoenix API keyFrom env
--format <format>pretty, json, or rawpretty
--no-progressDisable progress output
--max-concurrent <number>Maximum concurrent fetches10

px trace get <trace-id>

Fetch a specific trace by ID.
px trace get abc123def456
px trace get abc123def456 --file trace.json      # Save to file
px trace get abc123def456 --format raw | jq      # Pipe to jq
OptionDescriptionDefault
--file <path>Save to file instead of stdoutstdout
--format <format>pretty, json, or rawpretty
--endpoint <url>Phoenix API endpointFrom env
--project <name>Project name or IDFrom env
--api-key <key>Phoenix API keyFrom env
--no-progressDisable progress indicators

px span list [file]

Fetch individual spans from the configured project with comprehensive filtering.
px span list --limit 20                                    # Recent spans (table view)
px span list --last-n-minutes 60 --limit 50                # Spans from last hour
px span list --span-kind LLM --limit 10                    # Only LLM spans
px span list --status-code ERROR --limit 20                # Only errored spans
px span list --span-kind LLM TOOL --status-code OK         # Combine filters
px span list --name chat_completion --limit 10             # Filter by span name
px span list --trace-id abc123 --format raw | jq           # All spans for a trace
px span list --include-annotations --limit 10              # Include annotation scores
px span list output.json --limit 100                       # Save to JSON file
px span list --format raw --no-progress | jq               # Pipe to jq
OptionDescriptionDefault
[file]Save spans as JSON to filestdout
-n, --limit <number>Maximum spans to fetch (newest first)100
--last-n-minutes <number>Only fetch spans from the last N minutes
--since <timestamp>Fetch spans since ISO timestamp
--span-kind <kinds...>Filter by span kind (LLM, CHAIN, TOOL, RETRIEVER, EMBEDDING, AGENT, RERANKER, GUARDRAIL, EVALUATOR, UNKNOWN)
--status-code <codes...>Filter by status code (OK, ERROR, UNSET)
--name <names...>Filter by span name(s)
--trace-id <ids...>Filter by trace ID(s)
--parent-id <id>Filter by parent span ID (use "null" for root spans)
--include-annotationsInclude span annotations in outputOff
--endpoint <url>Phoenix API endpointFrom env
--project <name>Project name or IDFrom env
--api-key <key>Phoenix API keyFrom env
--format <format>pretty, json, or rawpretty
--no-progressDisable progress indicators

px session list

List sessions (multi-turn conversations) for a project.
px session list                                       # List recent sessions
px session list --limit 20                            # More sessions
px session list --order asc                           # Oldest first
px session list --format raw --no-progress | jq       # Pipe to jq
OptionDescriptionDefault
-n, --limit <number>Maximum number of sessions to return10
--order <order>Sort order: asc or descdesc
--endpoint <url>Phoenix API endpointFrom env
--project <name>Project name or IDFrom env
--api-key <key>Phoenix API keyFrom env
--format <format>pretty, json, or rawpretty
--no-progressDisable progress indicators

px session get <session-id>

View a session’s conversation flow, including all traces (turns) in the session.
px session get my-chat-session-001                              # By session_id
px session get UHJvamVjdFNlc3Npb24...                           # By GlobalID
px session get my-chat-session-001 --include-annotations        # With annotations
px session get my-chat-session-001 --file session.json          # Save to file
px session get my-chat-session-001 --format raw | jq            # Pipe to jq
OptionDescriptionDefault
--include-annotationsInclude session annotationsOff
--file <path>Save to file instead of stdoutstdout
--format <format>pretty, json, or rawpretty
--endpoint <url>Phoenix API endpointFrom env
--project <name>Project name or IDFrom env
--api-key <key>Phoenix API keyFrom env
--no-progressDisable progress indicators

px dataset list

List all available datasets.
px dataset list
px dataset list --format json                    # JSON output
px dataset list --format raw --no-progress | jq  # Pipe to jq
OptionDescriptionDefault
--endpoint <url>Phoenix API endpointFrom env
--api-key <key>Phoenix API keyFrom env
--format <format>pretty, json, or rawpretty
--no-progressDisable progress indicators
--limit <number>Maximum number of datasets

px dataset get <dataset-identifier>

Fetch examples from a dataset.
px dataset get query_response                        # Fetch all examples
px dataset get query_response --split train          # Filter by split
px dataset get query_response --split train --split test  # Multiple splits
px dataset get query_response --version <version-id> # Specific version
px dataset get query_response --file dataset.json    # Save to file
px dataset get query_response --format raw | jq '.examples[].input'
OptionDescriptionDefault
--split <name>Filter by split (can be used repeatedly)
--version <id>Fetch from specific dataset versionlatest
--file <path>Save to file instead of stdoutstdout
--format <format>pretty, json, or rawpretty
--endpoint <url>Phoenix API endpointFrom env
--api-key <key>Phoenix API keyFrom env
--no-progressDisable progress indicators

px experiment list --dataset <name-or-id>

List experiments for a dataset, optionally exporting full data to files.
px experiment list --dataset my-dataset                 # List experiments
px experiment list --dataset my-dataset --format json   # JSON output
px experiment list --dataset my-dataset ./experiments   # Export to directory
OptionDescriptionDefault
--dataset <name-or-id>Dataset name or ID (required)
[directory]Export experiment JSON files to directorystdout
--endpoint <url>Phoenix API endpointFrom env
--api-key <key>Phoenix API keyFrom env
--format <format>pretty, json, or rawpretty
--no-progressDisable progress indicators
--limit <number>Maximum number of experiments

px experiment get <experiment-id>

Fetch a single experiment with all run data, including inputs, outputs, evaluations, and trace IDs.
px experiment get RXhwZXJpbWVudDox
px experiment get RXhwZXJpbWVudDox --file exp.json   # Save to file
px experiment get RXhwZXJpbWVudDox --format json     # JSON output
OptionDescriptionDefault
--file <path>Save to file instead of stdoutstdout
--format <format>pretty, json, or rawpretty
--endpoint <url>Phoenix API endpointFrom env
--api-key <key>Phoenix API keyFrom env
--no-progressDisable progress indicators

px prompt list

List all available prompts.
px prompt list
px prompt list --format json                    # JSON output
px prompt list --format raw --no-progress | jq  # Pipe to jq
OptionDescriptionDefault
--endpoint <url>Phoenix API endpointFrom env
--api-key <key>Phoenix API keyFrom env
--format <format>pretty, json, or rawpretty
--no-progressDisable progress indicators
--limit <number>Maximum number of prompts

px prompt get <prompt_identifier>

Show a Phoenix prompt. Supports multiple output formats including a text format optimized for piping to AI coding assistants.
px prompt get my-assistant-prompt                    # Latest version (pretty)
px prompt get my-assistant-prompt --tag production   # Get by tag
px prompt get my-assistant-prompt --version abc123   # Specific version
px prompt get my-assistant-prompt --format json      # JSON output
px prompt get my-assistant-prompt --format text      # Plain text for piping
OptionDescriptionDefault
--tag <name>Get prompt version by tag name
--version <id>Get specific prompt version by IDlatest
--format <format>pretty, json, raw, or textpretty
--endpoint <url>Phoenix API endpointFrom env
--api-key <key>Phoenix API keyFrom env
--no-progressDisable progress indicators
The text format outputs prompt content with XML-style role tags, ideal for piping to AI assistants:
<system>You are a helpful assistant specialized in...</system>
<user>{{user_input}}</user>

px api graphql <query>

Make authenticated GraphQL queries against the Phoenix API. Output is {"data": {...}} JSON — pipe with jq '.data.<field>' to extract values. Only queries are permitted; mutations and subscriptions are rejected before hitting the server.
px api graphql '<query>' [--endpoint <url>] [--api-key <key>]
Argument/OptionDescriptionDefault
<query>GraphQL query string
--endpoint <url>Phoenix API endpoint$PHOENIX_HOST
--api-key <key>Phoenix API key$PHOENIX_API_KEY

Discover the schema with introspection

Use introspection to explore what fields and types are available without leaving your terminal:
$ px api graphql '{ __schema { queryType { fields { name } } } }' | \
    jq '.data.__schema.queryType.fields[].name'
"projects"
"datasets"
"prompts"
"evaluators"
"projectCount"
"datasetCount"
"promptCount"
"evaluatorCount"
"serverStatus"
"viewer"
...

$ px api graphql '{ __type(name: "Experiment") { fields { name type { name } } } }' | \
    jq '.data.__type.fields[] | {name, type: .type.name}'
{"name": "id", "type": "ID"}
{"name": "name", "type": "String"}
{"name": "runCount", "type": "Int"}
{"name": "errorRate", "type": "Float"}
{"name": "averageRunLatencyMs", "type": "Float"}

Projects

$ px api graphql '{ projects { edges { node { name traceCount tokenCountTotal } } } }'
{
  "data": {
    "projects": {
      "edges": [
        { "node": { "name": "default", "traceCount": 1482, "tokenCountTotal": 219083 } }
      ]
    }
  }
}

$ px api graphql '{ projects { edges { node { name traceCount } } } }' | \
    jq '.data.projects.edges[].node'
{"name": "default", "traceCount": 1482}
Available fields: id, name, traceCount, recordCount, tokenCountTotal, tokenCountPrompt, tokenCountCompletion, createdAt, updatedAt.

Datasets

$ px api graphql '{ datasets { edges { node { name exampleCount experimentCount } } } }' | \
    jq '.data.datasets.edges[].node'
{"name": "eval-golden-set", "exampleCount": 120, "experimentCount": 4}
{"name": "rag-test-cases", "exampleCount": 50, "experimentCount": 1}

$ px api graphql '{ datasetCount }' | jq '.data.datasetCount'
12
Available fields: id, name, description, exampleCount, experimentCount, evaluatorCount, createdAt, updatedAt.

Experiments

Experiments are nested under datasets in the GraphQL schema:
$ px api graphql '{
  datasets {
    edges {
      node {
        name
        experiments {
          edges {
            node { name runCount errorRate averageRunLatencyMs }
          }
        }
      }
    }
  }
}' | jq '.data.datasets.edges[].node | {dataset: .name, experiments: [.experiments.edges[].node]}'

# Find experiments with non-zero error rate
$ px api graphql '{
  datasets { edges { node { name experiments { edges { node { name errorRate runCount } } } } } }
}' | jq '.. | objects | select(.errorRate? > 0)'
To inspect individual run outputs, errors, and trace IDs:
$ px api graphql '{
  datasets(first: 1) {
    edges { node { experiments(first: 1) { edges { node {
      name
      runs { edges { node { traceId output error latencyMs } } }
    } } } } }
  }
}' | jq '.data.datasets.edges[0].node.experiments.edges[0].node.runs.edges[].node'
{"traceId": "b696d0ac...", "output": {"answer": "Moore's Law is..."}, "error": null, "latencyMs": 1006}
Available run fields: traceId, output, error, latencyMs, startTime, endTime.

Evaluators

$ px api graphql '{ evaluators { edges { node { name kind description isBuiltin } } } }' | \
    jq '.data.evaluators.edges[].node'
{"name": "correctness", "kind": "LLM", "description": "Evaluates answer correctness", "isBuiltin": true}

Instance summary

$ px api graphql '{ projectCount datasetCount promptCount evaluatorCount }'
{
  "data": {
    "projectCount": 1,
    "datasetCount": 12,
    "promptCount": 3,
    "evaluatorCount": 2
  }
}

## Output Formats

**`pretty`** (default) — Human-readable tree view:

┌─ Trace: abc123def456 │ │ Input: What is the weather in San Francisco? │ Output: The weather is currently sunny… │ │ Spans: │ └─ ✓ agent_run (CHAIN) - 1250ms │ ├─ ✓ llm_call (LLM) - 800ms │ └─ ✓ tool_execution (TOOL) - 400ms └─

**`json`** — Formatted JSON with indentation.

**`raw`** — Compact JSON for piping to `jq` or other tools.

## JSON Structure

```json
{
  "traceId": "abc123def456",
  "spans": [
    {
      "name": "chat_completion",
      "context": {
        "trace_id": "abc123def456",
        "span_id": "span-1"
      },
      "span_kind": "LLM",
      "parent_id": null,
      "start_time": "2026-01-17T10:00:00.000Z",
      "end_time": "2026-01-17T10:00:01.250Z",
      "status_code": "OK",
      "attributes": {
        "llm.model_name": "gpt-4",
        "llm.token_count.prompt": 512,
        "llm.token_count.completion": 256,
        "input.value": "What is the weather?",
        "output.value": "The weather is sunny..."
      }
    }
  ],
  "rootSpan": { ... },
  "startTime": "2026-01-17T10:00:00.000Z",
  "endTime": "2026-01-17T10:00:01.250Z",
  "duration": 1250,
  "status": "OK"
}
Spans include OpenInference semantic attributes like llm.model_name, llm.token_count.*, input.value, output.value, tool.name, and exception.*.

Examples

Debug failed traces

px trace list --limit 20 --format raw --no-progress | jq '.[] | select(.status == "ERROR")'

Find slowest traces

px trace list --limit 10 --format raw --no-progress | jq 'sort_by(-.duration) | .[0:3]'

Find errored spans

px span list --status-code ERROR --limit 50 --format raw --no-progress | jq '.[] | {name, status_message}'

Inspect LLM spans with annotations

px span list --span-kind LLM --include-annotations --limit 20

Extract LLM models used

px trace list --limit 50 --format raw --no-progress | \
  jq -r '.[].spans[] | select(.span_kind == "LLM") | .attributes["llm.model_name"]' | sort -u

Count errors

px trace list --limit 100 --format raw --no-progress | jq '[.[] | select(.status == "ERROR")] | length'

List datasets and experiments

# List all datasets
px dataset list --format raw --no-progress | jq '.[].name'
# Output: "query_response"

# List experiments for a dataset
px experiment list --dataset query_response --format raw --no-progress | \
  jq '.[] | {id, successful_run_count, failed_run_count}'
# Output: {"id":"RXhwZXJpbWVudDox","successful_run_count":249,"failed_run_count":1}

# Export all experiment data for a dataset to a directory
px experiment list --dataset query_response ./experiments/

Analyze experiment results

# Get input queries and latency from an experiment
px experiment get RXhwZXJpbWVudDox --format raw --no-progress | \
  jq '.[] | {query: .input.query, latency_ms, trace_id}'

# Find failed runs in an experiment
px experiment get RXhwZXJpbWVudDox --format raw --no-progress | \
  jq '.[] | select(.error != null) | {query: .input.query, error}'
# Output: {"query":"looking for complex fodmap meal ideas","error":"peer closed connection..."}

# Calculate average latency across runs
px experiment get RXhwZXJpbWVudDox --format raw --no-progress | \
  jq '[.[].latency_ms] | add / length'

Work with prompts

# List all prompts
px prompt list --format raw --no-progress | jq '.[].name'

# Get prompt template content
px prompt get my-evaluator --format text --no-progress

# View prompt with all metadata
px prompt get my-evaluator --format json --no-progress | jq '.template'

# Get a specific tagged version
px prompt get my-evaluator --tag production --format text --no-progress

Query the GraphQL API directly

# Quick instance summary
$ px api graphql '{ projectCount datasetCount promptCount evaluatorCount }'
{"data": {"projectCount": 1, "datasetCount": 12, "promptCount": 3, "evaluatorCount": 2}}

# Discover all available query fields
$ px api graphql '{ __schema { queryType { fields { name } } } }' | \
    jq '.data.__schema.queryType.fields[].name'

# Projects with stats
$ px api graphql '{ projects { edges { node { name traceCount tokenCountTotal } } } }' | \
    jq '.data.projects.edges[].node'

# Datasets with counts
$ px api graphql '{ datasets { edges { node { name exampleCount experimentCount } } } }' | \
    jq '.data.datasets.edges[].node'

# Find experiments with errors
$ px api graphql '{
  datasets { edges { node { name experiments { edges { node { name errorRate runCount } } } } } }
}' | jq '.. | objects | select(.errorRate? > 0)'

# Drill into run outputs
$ px api graphql '{
  datasets(first: 1) { edges { node {
    experiments(first: 1) { edges { node {
      runs { edges { node { traceId output error latencyMs } } }
    } } }
  } } }
}' | jq '.data.datasets.edges[0].node.experiments.edges[0].node.runs.edges[].node'

# Get viewer info (authenticated instances)
$ px api graphql '{ viewer { username email } }'

Use with AI Coding Assistants

Phoenix CLI is designed to work seamlessly with AI coding assistants like Claude Code, Cursor, and Windsurf.

Claude Code

Ask Claude Code:
Use px to fetch the last 3 traces from my Phoenix project and analyze them for potential improvements
Claude Code will discover the CLI via px --help and fetch your traces for analysis.

Prompt Optimization with Claude Code

Pipe your Phoenix prompts directly to Claude Code for analysis and optimization suggestions:
# Get prompt optimization ideas
px prompt get my-evaluator --format text --no-progress | claude -p "Review this prompt and suggest improvements for clarity and effectiveness"

# Analyze prompt for edge cases
px prompt get my-assistant --format text --no-progress | claude -p "What edge cases might this prompt fail to handle?"

# Generate test cases for a prompt
px prompt get my-classifier --format text --no-progress | claude -p "Generate 5 diverse test inputs to evaluate this prompt"
You can also ask Claude Code to work with your prompts interactively:
Fetch my "correctness-evaluator" prompt from Phoenix and suggest how to make the rubric more specific

Cursor / Windsurf

Run the CLI in the terminal and ask the AI to interpret:
Fetch my recent Phoenix traces using px and explain what my agent is doing
For prompt work:
List my Phoenix prompts with px and help me improve the system prompt for my assistant

Retrieve Traces via CLI

User guide for fetching traces from the command line

@arizeai/phoenix-client

TypeScript client for the Phoenix API

License

Apache 2.0