- Build a customer support agent with the OpenAI Agents SDK
- Trace agent activity to monitor interactions
- Generate a benchmark dataset for performance analysis
- Evaluate agent performance using Ragas
Google Colab
colab.research.google.com
Creating the Agent
Here we’ve setup a basic agent that can solve math problems. We have a function tool that can solve math equations, and an agent that can use this tool. We’ll use theRunner class to run the agent and get the final output.
Evaluating the Agent
Agents can go awry for a variety of reasons. We can use Ragas to evaluate whether the agent responded correctly. Two Ragas measurements help with this:- Tool Call Accuracy - Did our agent choose the right tool with the right arguments?
- Agent Goal Accuracy - Did our agent accomplish the stated goal and get to the right outcome?
multi_turn_ascore(sample) to get the results. The AgentGoalAccuracyWithReference metric compares the final output to the reference to see if the goal was accomplished. The ToolCallAccuracy metric compares the tool call to the reference tool call to see if the tool call was made correctly.
In the notebook, we also define the helper function conversation_to_ragas_sample which converts the agent messages into a format that Ragas can use.
The following code snippets define our task function and evaluators.

