Google Colab
colab.research.google.com
- Create a customer support agent using a router template
- Trace the agent activity, including function calling
- Create a dataset to benchmark performance
- Evaluate agent performance using code, human annotation, and LLM as a judge
- Experiment with different prompts and models
Notebook Walkthrough
We will go through key code snippets on this page. To follow the full tutorial, check out the Colab notebook above.Customer Support Agent Architecture
We’ll be creating a customer support agent using function calling following the architecture below:
Create Tools and Agent
We have 6 functions that we will define for our agent:- product_comparison
- product_search
- customer_support
- track_package
- product_details
- apply_discount_code
run_prompt, which uses the chat completion call from OpenAI with functions
Generate Synthetic Dataset of Questions
Now that we have a basic agent, let’s generate a dataset of questions and run the prompt against this dataset! Using the template below, we’re going to generate a dataframe of 25 questions we can use to test our customer support agent.Evaluating your Agent
Now that we have a set of test cases, we can create evaluators to measure performance. This way, we don’t have to manually inspect every single trace to see if the LLM is doing the right thing. Here, we are defining our evaluation templates to judge whether the router selected a function correctly, whether it selected the right function, and whether it filled the arguments correctly.llm_classify function for our responses dataframe we generated above!
Create and Run an Experiment
With our dataset of questions we generated above, we can use our experiments feature to track changes across models, prompts, parameters for our agent.View Results


