When To Use Code Generation Eval Template
This Eval checks the correctness and readability of the code from a code generation process. The template variables are:- query: The query is the coding question being asked
- code: The code is the code that was returned.
Code Generation Eval Template
We are continually iterating our templates, view the most up-to-date template on GitHub.
How To Run the Code Generation Eval
Benchmark Results
This benchmark was obtained using notebook below. It was run using an OpenAI Human Eval dataset as a ground truth dataset. Each example in the dataset was evaluating using theCODE_READABILITY_PROMPT_TEMPLATE above, then the resulting labels were compared against the ground truth label in the benchmark dataset to generate the confusion matrices below.
Try it out!
GPT-4 Results

| Code Eval | GPT-4 |
|---|---|
| Precision | 0.93 |
| Recall | 0.78 |
| F1 | 0.85 |


