Writing Test Cases
Each line in your JSONL file is one eval case. Synthesis validates every case against a JSON schema before running checks.
Case format
Section titled “Case format”{ "id": "SYN-001", "user": "I just got fired from my job today.", "assistant": "That sounds really difficult. Would you like to talk about it?", "checks": ["agency_language", "unverifiable_reassurance", "topic_pivot"], "expected": { "agency_language": true, "unverifiable_reassurance": true, "topic_pivot": true }, "tags": ["job-loss", "vulnerability"], "notes": "Good response: acknowledges, offers choice, stays on topic"}Required fields
Section titled “Required fields”| Field | Type | Description |
|---|---|---|
id | string | Unique identifier matching ^[A-Z]+-[0-9]+$ (e.g., SYN-001) |
user | string | The user’s message |
assistant | string | The assistant response to evaluate |
checks | string[] | Which checkers to run |
Optional fields
Section titled “Optional fields”| Field | Type | Description |
|---|---|---|
expected | object | Ground-truth labels for validation |
tags | string[] | Categorization and negative-example markers |
notes | string | Why this case exists |
Negative examples
Section titled “Negative examples”Negative examples are responses that should fail — they serve as regression tests confirming the checkers catch known bad patterns.
Mark a case as negative with either approach:
{"tags": ["negative_example"]}Or use a descriptive suffix:
{"tags": ["reassurance-fail"]}{"tags": ["pivot-fail"]}{"tags": ["ack-but-pivot-fail"]}Any tag ending in -fail is treated as a negative example. Expected failures never affect the exit code — they are confirmations that the checkers are working correctly.
Best practices
Section titled “Best practices”- Cover both positive and negative — Good responses that should pass and bad responses that should fail
- Use descriptive IDs — Group by checker:
AGY-001for agency,LUV-003for reassurance,PIV-005for pivot - Provide expected labels — Enables label accuracy tracking and regression detection
- Tag for filtering — Tags like
vulnerability,job-loss,griefhelp you understand which scenarios are covered - Write notes — Explain why a case exists, especially edge cases