See Llama-Index notebook for more info
Notebook Walkthrough
We will go through key code snippets on this page. To follow the full tutorial, check out the Colab notebook above.Upload Dataset to Phoenix
Here, we will grab 7 examples from a Hugging Face dataset.
Define Task Function
Task function can be either sync or async.Dry-Run Experiment
Conduct a dry-run experiment on 3 randomly selected examples.Define Evaluators For Each Experiment Run
Evaluators can be sync or async. Function argumentsoutput and expected refer to the attributes of the same name in the ExperimentRun data structure shown above.
The PairwiseEvaluator in LlamaIndex is used to compare two outputs side-by-side and determine which one is preferred.
This setup allows you to:
- Run automated A/B tests on different LlamaIndex query engine configurations
- Capture LLM-based preference data to guide iteration
- Aggregate pairwise win rates and qualitative feedback
View Results in Phoenix


