Follow along with code: This guide has a companion notebook with runnable code examples. Find it here, and go to Part 4: Optimize Prompts Automatically.
What is Prompt Learning?
Prompt learning is an iterative approach to optimizing LLM prompts by using feedback from evaluations to systematically improve prompt performance. Instead of manually tweaking prompts through trial and error, the SDK automates this process. The prompt learning process follows this workflow:
- Initial Prompt: Start with a baseline prompt that defines your task
- Generate Outputs: Use the prompt to generate responses on your dataset
- Evaluate Results: Run evaluators to assess output quality
- Optimize Prompt: Use feedback to generate an improved prompt
- Iterate: Repeat until performance meets your criteria
- Prompt Learning Blog Post
- Prompt Learning Video Overview
- Optimizing Claude Code with Prompt Learning
Install the Prompt Learning SDK
We’re now ready to put this into practice. Using the Prompt Learning SDK, we can take the evaluation data we’ve already collected - all those explanations, error types, and fix suggestions - and feed it back into an optimization loop. Instead of manually writing new instructions or tuning parameters, we’ll let the algorithm analyze our experiment results and generate an improved prompt automatically. Let’s install the SDK and use it to optimize our support query classifier.Load Experiment for Training
First, head to experiment we ran for version 4 and copy the experiment ID. Our experiment serves as our training data - we’ll use the outputs and evals we generated to train our new prompt version.
- Python
Load Unoptimized Prompt
Let’s load our unoptimized prompt from Phoenix so that we can funnel it through Prompt Learning.- Python
Optimize Prompt (Version 5)
Now, let’s optimize our prompt and push the optimized version back to Phoenix.- Python
Measure New Prompt Version’s Performance
Now that we’ve used Prompt Learning to build a new, optimized Prompt Version, let’s see how it actually performs! Let’s run another Phoenix experiment on the support query dataset with our new prompt.- Python

Summary
Congratulations! You’ve completed the Phoenix Prompts walkthrough!Across these modules, we’ve gone from identifying weak prompts to automatically optimizing them using real evaluation data. You’ve learned how to:
- Identify and edit prompts directly from traces to correct misclassifications.
- Test prompts at scale across datasets to measure accuracy and uncover systematic failure patterns.
- Compare prompt versions side by side to see which edits, parameters, or models lead to measurable gains.
- Automate prompt optimization with Prompt Learning, using English feedback from evaluations to train stronger prompts without manual rewriting.
- Improve accuracy by 30%!
- Track every iteration in Phoenix, from dataset creation and experiment runs to versioned prompts -creating a full feedback loop between your data, your LLM, and your application.

