Skip to main content

Evaluator Message Formats

Phoenix evaluators now support flexible prompt formats in both Python and TypeScript, giving you full control over how you structure prompts for LLM-based evaluations.

Supported Formats

String Templates - Simple templates with variable placeholders:
from phoenix.evals import ClassificationEvaluator, LLM

evaluator = ClassificationEvaluator(
    name="sentiment",
    llm=LLM(provider="openai", model="gpt-4o-mini"),
    prompt_template="Classify the sentiment: {text}",
    choices=["positive", "negative", "neutral"]
)
Message Lists - OpenAI-style arrays with role and content fields for multi-turn prompts:
evaluator = ClassificationEvaluator(
    name="helpfulness",
    llm=llm,
    prompt_template=[
        {"role": "system", "content": "You evaluate response helpfulness."},
        {"role": "user", "content": "Question: {question}\nAnswer: {answer}"}
    ],
    choices=["helpful", "somewhat_helpful", "not_helpful"]
)

Template Variable Syntax

  • Python: Supports both f-string ({variable}) and mustache ({{variable}}) syntax with auto-detection
  • TypeScript: Uses mustache syntax ({{variable}})

Provider Compatibility

Adapters handle provider-specific message transformations automatically:
ProviderTransformation
OpenAISystem role converted to developer role for reasoning models
AnthropicSystem messages extracted to system parameter
Google GenAISystem messages passed via system_instruction
LiteLLMMessages passed in OpenAI format (LiteLLM handles conversion)
LangChainConverted to LangChain message objects

More Information:

Prompt Formats Documentation