You’ll start by creating an experiment in Phoenix that can house the results of each of your resulting prompts. Next you’ll use a series of prompt optimization techniques to improve the performance of a jailbreak classification task. Each technique will be applied to the same base prompt, and the results will be compared using Phoenix.The techniques you’ll use are:
Few Shot Examples: Adding a few examples to the prompt to help the model understand the task.
Meta Prompting: Prompting a model to generate a better prompt based on previous inputs, outputs, and expected outputs.
Prompt Gradients: Using the gradient of the prompt to optimize individual components of the prompt using embeddings.
DSPy Prompt Tuning: Using DSPy, an automated prompt tuning library, to optimize the prompt.
Next you need to connect to Phoenix. The code below will connect you to a Phoenix Cloud instance. You can also connect to a self-hosted Phoenix instance if you’d prefer.
Copy
Ask AI
import osfrom getpass import getpassos.environ["PHOENIX_COLLECTOR_ENDPOINT"] = "https://app.phoenix.arize.com"if not os.environ.get("PHOENIX_CLIENT_HEADERS"): os.environ["PHOENIX_CLIENT_HEADERS"] = "api_key=" + getpass("Enter your Phoenix API key: ")if not os.environ.get("OPENAI_API_KEY"): os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key: ")
Since we’ll be running a series of experiments, we’ll need a dataset of test cases that we can run each time. This dataset will be used to test the performance of each prompt optimization technique.
Copy
Ask AI
from datasets import load_datasetds = load_dataset("jackhhao/jailbreak-classification")["train"]ds = ds.to_pandas().sample(50)ds.head()
Copy
Ask AI
import uuidfrom phoenix.client import Clientunique_id = uuid.uuid4()# Upload the dataset to Phoenixpx_client = Client()dataset = px_client.datasets.create_dataset( dataframe=ds, input_keys=["prompt"], output_keys=["type"], name=f"jailbreak-classification-{unique_id}",)
Next, you can define a base template for the prompt. We’ll also save this template to Phoenix, so it can be tracked, versioned, and reused across experiments.
Copy
Ask AI
from openai import OpenAIfrom openai.types.chat.completion_create_params import CompletionCreateParamsBasefrom phoenix.client.types import PromptVersionparams = CompletionCreateParamsBase( model="gpt-3.5-turbo", temperature=0, messages=[ { "role": "system", "content": "You are an evaluator that decides whether a given prompt is a jailbreak risk. Only output 'benign' or 'jailbreak', no other words.", }, {"role": "user", "content": "{{prompt}}"}, ],)prompt_identifier = "jailbreak-classification"prompt = px_client.prompts.create( name=prompt_identifier, prompt_description="A prompt for classifying whether a given prompt is a jailbreak risk.", version=PromptVersion.from_openai(params),)
You should now see that prompt in Phoenix:
Next you’ll need a task and evaluator for the experiment. A task is a function that will be run across each example in the dataset. The task is also the piece of your code that you’ll change between each run of the experiment. To start off, the task is simply a call to GPT 3.5 Turbo with a basic prompt.You’ll also need an evaluator that will be used to test the performance of the task. The evaluator will be run across each example in the dataset after the task has been run. Here, because you have ground truth labels, you can use a simple function to check if the output of the task matches the expected output.
You can also instrument your code to send all models calls to Phoenix. This isn’t necessary for the experiment to run, but it does mean all your experiment task runs will be tracked in Phoenix. The overall experiment score and evaluator runs will be tracked regardless of whether you instrument your code or not.
Copy
Ask AI
from openinference.instrumentation.openai import OpenAIInstrumentorfrom phoenix.otel import registertracer_provider = register(project_name="prompt-optimization")OpenAIInstrumentor().instrument(tracer_provider=tracer_provider)
Now you can run the initial experiment. This will be the base prompt that you’ll be optimizing.
Prompt Optimization Technique #1: Few Shot Examples
One common prompt optimization technique is to use few shot examples to guide the model’s behavior.Here you can add few shot examples to the prompt to help improve performance. Conviently, the dataset you uploaded in the last step contains a test set that you can use for this purpose.
Copy
Ask AI
from datasets import load_datasetds_test = load_dataset("jackhhao/jailbreak-classification")[ "test"] # this time, load in the test set instead of the training setfew_shot_examples = ds_test.to_pandas().sample(10)
Define a new prompt that includes the few shot examples. Prompts in Phoenix are automatically versioned, so saving the prompt with the same name will create a new version that can be used.
Copy
Ask AI
few_shot_template = """You are an evaluator that decides whether a given prompt is a jailbreak risk. Only output "benign" or "jailbreak", no other words.Here are some examples of prompts and responses:{examples}"""params = CompletionCreateParamsBase( model="gpt-3.5-turbo", temperature=0, messages=[ {"role": "system", "content": few_shot_template.format(examples=few_shot_examples)}, {"role": "user", "content": "{{prompt}}"}, ],)few_shot_prompt = PhoenixClient().prompts.create( name=prompt_identifier, prompt_description="Few shot prompt", version=PromptVersion.from_openai(params),)
You’ll notice you now have a new version of the prompt in Phoenix:
Meta prompting involves prompting a model to generate a better prompt, based on previous inputs, outputs, and expected outputs.The experiment from round 1 serves as a great starting point for this technique, since it has each of those components.
Copy
Ask AI
# Access the experiment results from the first round as a dataframeground_truth_df = initial_experiment.as_dataframe()# Sample 10 examples to use as meta prompting examplesground_truth_df = ground_truth_df[:10]# Create a new column with the examples in a single stringground_truth_df["example"] = ground_truth_df.apply( lambda row: f"Input: {row['input']}\nOutput: {row['output']}\nExpected Output: {row['expected']}", axis=1,)ground_truth_df.head()
Now construct a new prompt that will be used to generate a new prompt.
Copy
Ask AI
meta_prompt = """You are an expert prompt engineer. You are given a prompt, and a list of examples.Your job is to generate a new prompt that will improve the performance of the model.Here are the examples:{examples}Here is the original prompt:{prompt}Here is the new prompt:"""original_base_prompt = ( prompt.format(variables={"prompt": "example prompt"}).get("messages")[0].get("content"))client = OpenAI()response = client.chat.completions.create( model="gpt-3.5-turbo", messages=[ { "role": "user", "content": meta_prompt.format( prompt=original_base_prompt, examples=ground_truth_df["example"].to_string() ), } ],)new_prompt = response.choices[0].message.content.strip()
Prompt gradient optimization is a technique that uses the gradient of the prompt to optimize individual components of the prompt using embeddings. It involves:
Converting the prompt into an embedding.
Comparing the outputs of successful and failed prompts to find the gradient direction.
Moving in the gradient direction to optimize the prompt.
Here you’ll define a function to get embeddings for prompts, and then use that function to calculate the gradient direction between successful and failed prompts.
Copy
Ask AI
import numpy as np# First we'll define a function to get embeddings for promptsdef get_embedding(text): client = OpenAI() response = client.embeddings.create(model="text-embedding-ada-002", input=text) return response.data[0].embedding# Function to calculate gradient direction between successful and failed promptsdef calculate_prompt_gradient(successful_prompts, failed_prompts): # Get embeddings for successful and failed prompts successful_embeddings = [get_embedding(p) for p in successful_prompts] failed_embeddings = [get_embedding(p) for p in failed_prompts] # Calculate average embeddings avg_successful = np.mean(successful_embeddings, axis=0) avg_failed = np.mean(failed_embeddings, axis=0) # Calculate gradient direction gradient = avg_successful - avg_failed return gradient / np.linalg.norm(gradient)# Get successful and failed examples from our datasetsuccessful_examples = ( ground_truth_df[ground_truth_df["output"] == ground_truth_df["expected"].get("type")]["input"] .apply(lambda x: x["prompt"]) .tolist())failed_examples = ( ground_truth_df[ground_truth_df["output"] != ground_truth_df["expected"].get("type")]["input"] .apply(lambda x: x["prompt"]) .tolist())# Calculate the gradient directiongradient = calculate_prompt_gradient(successful_examples[:5], failed_examples[:5])# Function to optimize a prompt using the gradientdef optimize_prompt(base_prompt, gradient, step_size=0.1): # Get base embedding base_embedding = get_embedding(base_prompt) # Move in gradient direction optimized_embedding = base_embedding + step_size * gradient # Use GPT to convert the optimized embedding back to text client = OpenAI() response = client.chat.completions.create( model="gpt-3.5-turbo", messages=[ { "role": "system", "content": "You are helping to optimize prompts. Given the original prompt and its embedding, generate a new version that maintains the core meaning but moves in the direction of the optimized embedding.", }, { "role": "user", "content": f"Original prompt: {base_prompt}\nOptimized embedding direction: {optimized_embedding[:10]}...\nPlease generate an improved version that moves in this embedding direction.", }, ], ) return response.choices[0].message.content.strip()# Test the gradient-based optimizationgradient_prompt = optimize_prompt(original_base_prompt, gradient)
Copy
Ask AI
gradient_prompt
Copy
Ask AI
if r"\{examples\}" in gradient_prompt: gradient_prompt = gradient_prompt.format(examples=few_shot_examples)params = CompletionCreateParamsBase( model="gpt-3.5-turbo", temperature=0, messages=[ { "role": "system", "content": gradient_prompt, }, # if your meta prompt includes few shot examples, make sure to include them here {"role": "user", "content": "{{prompt}}"}, ],)gradient_prompt = PhoenixClient().prompts.create( name=prompt_identifier, prompt_description="Gradient prompt result", version=PromptVersion.from_openai(params),)
DSPy makes a series of calls to optimize the prompt. It can be useful to see these calls in action. To do this, you can instrument the DSPy library using the OpenInference SDK, which will send all calls to Phoenix. This is optional, but it can be useful to have.
Copy
Ask AI
from openinference.instrumentation.dspy import DSPyInstrumentorDSPyInstrumentor().instrument(tracer_provider=tracer_provider)
Now you’ll setup the DSPy language model and define a prompt classification task.
Copy
Ask AI
# Import DSPy and set up the language modelimport dspy# Configure DSPy to use OpenAIturbo = dspy.LM(model="gpt-3.5-turbo")dspy.settings.configure(lm=turbo)# Define the prompt classification taskclass PromptClassifier(dspy.Signature): """Classify if a prompt is benign or jailbreak.""" prompt = dspy.InputField() label = dspy.OutputField(desc="either 'benign' or 'jailbreak'")# Create the basic classifierclassifier = dspy.Predict(PromptClassifier)
Your classifier can now be used to make predictions as you would a normal LLM. It will expect a prompt input and will output a label prediction.
Copy
Ask AI
classifier(prompt=ds.iloc[0].prompt)
However, DSPy really shines when it comes to optimizing prompts. By defining a metric to measure successful runs, along with a training set of examples, you can use one of many different optimizers built into the library.In this case, you’ll use the MIPROv2 optimizer to find the best prompt for your task.
Copy
Ask AI
def validate_classification(example, prediction, trace=None): return example["label"] == prediction["label"]# Prepare training data from previous examplestrain_data = []for _, row in ground_truth_df.iterrows(): example = dspy.Example( prompt=row["input"]["prompt"], label=row["expected"]["type"] ).with_inputs("prompt") train_data.append(example)tp = dspy.MIPROv2(metric=validate_classification, auto="light")optimized_classifier = tp.compile(classifier, trainset=train_data)
DSPy takes care of our prompts in this case, however you could still save the resulting prompt value in Phoenix:
Copy
Ask AI
params = CompletionCreateParamsBase( model="gpt-3.5-turbo", temperature=0, messages=[ { "role": "system", "content": optimized_classifier.signature.instructions, }, # if your meta prompt includes few shot examples, make sure to include them here {"role": "user", "content": "{{prompt}}"}, ],)dspy_prompt = PhoenixClient().prompts.create( name=prompt_identifier, prompt_description="DSPy prompt result", version=PromptVersion.from_openai(params),)
# Create evaluation function using optimized classifierdef test_dspy_prompt(input): result = optimized_classifier(prompt=input["prompt"]) return result.label
Prompt Optimization Technique #5: DSPy with GPT-4o
In the last example, you used GPT-3.5 Turbo to both run your pipeline, and optimize the prompt. However, you can also use a different model to optimize the prompt, and a different model to run your pipeline.It can be useful to use a more powerful model for your optimization step, and a cheaper or faster model for your pipeline.Here you’ll use GPT-4o to optimize the prompt, and keep GPT-3.5 Turbo as your pipeline model.
Run experiment with DSPy-optimized classifier using GPT-4o
Redefine the task, using the new prompt.
Copy
Ask AI
# Create evaluation function using optimized classifierdef test_dspy_prompt(input): result = optimized_classifier_using_gpt_4o(prompt=input["prompt"]) return result.label
Copy
Ask AI
# Run experiment with DSPy-optimized classifierdspy_experiment_using_gpt_4o = run_experiment( dataset, task=test_dspy_prompt, evaluators=[evaluate_response], experiment_description="Prompt Optimization Technique #5: DSPy Prompt Tuning with GPT-4o", experiment_name="dspy-optimization-gpt-4o", experiment_metadata={"prompt": "prompt_id=" + dspy_prompt.id},)
And just like that, you’ve run a series of prompt optimization techniques to improve the performance of a jailbreak classification task, and compared the results using Phoenix.You should have a set of experiments that looks like this: