When To Use Summarization Eval Template
This Eval helps evaluate the summarization results of a summarization task. The template variables are:- document: The document text to summarize
- summary: The summary of the document
Summarization Eval Template
We are continually iterating our templates, view the most up-to-date template on GitHub.
How To Run the Summarization Eval
Benchmark Results
This benchmark was obtained using notebook below. It was run using a Daily Mail CNN summarization dataset as a ground truth dataset. Each example in the dataset was evaluating using theSUMMARIZATION_PROMPT_TEMPLATE above, then the resulting labels were compared against the ground truth label in the summarization dataset to generate the confusion matrices below.
Try it out!
GPT-4 Results

| Eval | GPT-4o | GPT-4 |
|---|---|---|
| Precision | 0.87 | 0.79 |
| Recall | 0.63 | 0.88 |
| F1 | 0.73 | 0.83 |


