- Datasets let you curate and organize examples to test your application systematically.
- Experiments let you compare different model versions or configurations on the same dataset to see which performs best.
Datasets
1
Launch Phoenix
Before setting up your first dataset, make sure Phoenix is running. For more step by step instructions, check out this Get Started guide.
- Phoenix Cloud
- Local (Self-hosted)
Before sending traces, make sure Phoenix is running. For more step by step instructions, check out this Get Started guide.
- Phoenix Cloud
Log in, create a space, navigate to the settings page in your space, and create your API keys.In your code, set your environment variables.You can find your collector endpoint here:
Your Collector Endpoint is: https://app.phoenix.arize.com/s/ + your space name.

2
Creating a Dataset
You can either create a Dataset in the UI, or via code.For this quickstart, you can download this sample.csv as a starter to run you through how to use datasets.That’s it! You’ve now successfully created your first dataset.
- UI
- Python
- TS
In the UI, you can either create a empty dataset and then populate data or upload from a CSV.Once you’ve downloaded the above csv file, you can follow the video below to upload your first dataset.
Experiments
Once you have a dataset, you’re now able to run experiments. Experiments are made of tasks &, optionally, evals. While running evals is optional, they provide valuable metrics to help you compare each of your experiments quickly — such as comparing models, catching regressions, and understanding which version performs best.1
Load your Dataset in Code
The first step is to pull down your dataset into your code.If you made your dataset in the UI, you can follow this code snippet:To get the version_id of your dataset, please navigate to the Versions tab and copy the version you want to run an experiment on.If you created your dataset programmatically, you should already have it available as an instance assigned to your dataset variable.
- Python
2
Create your Task
Create a Task to evaluate.
- Python
- Typescript
Your task can be any function with any definition & does not have to use an LLM. However, for our experiment we want to run our list of input questions through a new prompt, and will need to start by setting our API Keys:
3
Create your Evaluator
Next step is to create your Evaluator. If you have already defined your Q&A Correctness eval from the last quick start, you won’t need to redefine it. If not, you can follow along with these code snippets.You can run multiple evaluators at once. Let’s define a custom Completeness Eval.
4
Run your Experiment
Now that we have defined our Task & our Evaluators, we’re now ready to run our experiment.After running multiple experiments, you can compare the experiment output & evals side by side!
Optional: If you wanted to run even more evaluators after this experiment, you can do so following this code:

- Python
- Typescript

