@arizeai/phoenix-client/spans. See Annotations for the shared annotation model and concepts.
Relevant Source Files
src/spans/addDocumentAnnotation.tsfor the single-annotation APIsrc/spans/logDocumentAnnotations.tsfor batch loggingsrc/spans/types.tsfor theDocumentAnnotationinterface
Why Document Annotations
When a retriever returns a ranked list of documents, you need to know:- Were the right documents retrieved? (relevance)
- Were they ranked in the right order? (nDCG, MRR)
- Was at least one relevant document returned? (hit rate)
- How many of the top-K were relevant? (Precision@K)
How Document Annotations Work
Each document annotation targets a specific document by its position in the retriever span’s output. ThedocumentPosition is a 0-based index: if a retriever returns 5 documents, positions 0 through 4 are valid targets.
Document annotations share the same fields as span annotations (spanId, name, annotatorKind, label, score, explanation, metadata). The documentPosition tells Phoenix which retrieved document the feedback applies to.
Automatic Retrieval Metrics
Phoenix automatically computes nDCG, Precision@K, MRR, and Hit Rate from document annotations that have
annotatorKind: "LLM" and a numeric score. Annotations with annotatorKind: "HUMAN" or "CODE" are stored but do not feed into the auto-computed retrieval metrics.annotatorKind: "LLM" when logging relevance scores. This is the typical pattern when running an LLM-as-judge relevance evaluator over your retrieval results.
Score All Documents In A Retrieval
The most common pattern: after a retriever returns N documents, score each one for relevance. UselogDocumentAnnotations to send them in a single batch:
Binary Relevance Labeling
The simplest relevance scheme: each document is either relevant (1) or not (0). This is the most common input for hit rate and nDCG:- Hit Rate = 1 if any document has score 1, else 0
- Precision@K = fraction of top-K documents with score 1
- MRR = 1 / (rank of first document with score 1)
- nDCG = normalized discounted cumulative gain across the ranked list
Graded Relevance
For finer-grained evaluation, use continuous scores (e.g. 0–1) instead of binary. This gives nDCG more signal about how relevant each document is, not just whether it’s relevant at all:Add A Single Document Annotation
For one-off annotations — e.g. a human reviewer flagging a specific document:Multi-Dimensional Document Scoring
Score the same documents on multiple axes by using different annotation names. Each name creates a separate annotation series in the Phoenix UI:Re-Ranking Evaluation
Document annotations are useful for evaluating re-rankers. Annotate the same retriever span before and after re-ranking to compare the quality of the original vs. re-ranked order:Parameter Reference
DocumentAnnotation
| Field | Type | Required | Description |
|---|---|---|---|
spanId | string | Yes | The retriever span’s OpenTelemetry ID |
documentPosition | number | Yes | 0-based index of the document in retrieval results |
name | string | Yes | Annotation name (e.g. "relevance") |
annotatorKind | "HUMAN" | "LLM" | "CODE" | No | Defaults to "HUMAN". Use "LLM" for auto-computed retrieval metrics. |
label | string | No* | Categorical label (e.g. "relevant", "irrelevant") |
score | number | No* | Numeric relevance score (e.g. 0 or 1 for binary, 0–1 for graded) |
explanation | string | No* | Free-text explanation |
metadata | Record<string, unknown> | No | Arbitrary metadata |
label, score, or explanation is required.
Document annotations are unique by (name, spanId, documentPosition). Unlike span annotations, the identifier field is not supported for document annotations.
Source Map
src/spans/addDocumentAnnotation.tssrc/spans/logDocumentAnnotations.tssrc/spans/types.tssrc/types/annotations.ts

