Skip to content

Reference

synthesis [options]
Options:
--cases <path> Path to JSONL test cases (default: data/evals.jsonl)
--schema <path> Path to JSON schema (default: schemas/eval_case.schema.json)
--out <path> Output path for JSON report (default: out/report.json)
--fail-on <n> Max allowed unexpected failures before exit code 2 (default: 0)
--help, -h Show help message

Every run produces a structured JSON report at the configured output path.

FieldWhat It Means
casesTotal number of eval cases processed
passedCases where all checks passed
failedCases where at least one check failed
strict_passedCases that passed and were not expected to fail
strict_failedUnexpected failures — regressions
expected_failuresNegative examples correctly caught
unexpected_failuresSame as strict_failed — drives exit code
label_accuracyHow well computed results match ground-truth expected labels
by_checkPer-checker pass/fail/N/A breakdown
FieldWhat It Means
label_accuracy_by_checkPer-checker label accuracy breakdown (total, matched, accuracy percentage)

Each failure includes the case ID, which checks failed, the evidence that triggered the failure, and whether the failure was expected (negative example).

The topic_pivot checker includes a pass_strength field on every result:

ValueMeaning
clear_passStrong engagement signals (acknowledgment + follow-up, or high similarity)
borderline_passAcknowledgment present with moderate similarity but no explicit follow-up
clear_failInsufficient engagement with the user’s vulnerability
not_applicableNo vulnerability detected in the user message
CodeMeaning
0All checks passed (within threshold)
1Fatal error (invalid input, schema failure, missing files)
2Unexpected failures exceed --fail-on threshold

Expected failures (negative examples) never affect the exit code.

VariableEffect
MCP_OUTPUT=jsonPrints an MCP-style artifact object to stdout after the summary. Useful for tool integrations that consume structured output.

Synthesis exports the following functions from its source modules. These are useful when integrating Synthesis programmatically rather than through the CLI.

ModuleExportPurpose
loadloadCases(casesPath, schemaPath)Load and validate JSONL eval cases against a JSON schema
loadvalidateCase(evalCase, schemaPath)Validate a single case object (useful for testing)
runnerrunCase(evalCase)Run all checks on a single eval case
runnerrunAllCases(cases)Run all cases and compute aggregate metrics
reportwriteReport(report, outputPath)Write the JSON report to disk
reportprintSummary(report)Print a formatted summary to the console
reportformatArtifact(report, outputPath)Format the report as an MCP-style artifact object
checks/agencycheckAgency(assistantText)Run the agency language checker on a single response
checks/reassurancecheckReassurance(assistantText)Run the reassurance checker on a single response
checks/pivotcheckPivot(userText, assistantText)Run the topic pivot checker on a conversation pair
checks/similaritytokenCosineSimilarity(text1, text2)Compute bag-of-words cosine similarity between two texts
checks/similarityextractAnchor(text, maxSentences)Extract the first N sentences from a response
checks/similaritysetEmbeddingAdapter(adapter)Replace the default similarity engine with a custom adapter
synthesis/
data/
evals.jsonl # Bundled test cases
schemas/
eval_case.schema.json # JSON Schema for case validation
src/
index.ts # CLI entry point
load.ts # JSONL loader + AJV schema validation
runner.ts # Runs checks, computes metrics
report.ts # JSON report + console summary
types.ts # TypeScript type definitions
checks/
agency.ts # Agency language checker
reassurance.ts # Unverifiable reassurance checker
pivot.ts # Topic pivot checker
similarity.ts # Token cosine similarity
out/
report.json # Generated report (gitignored)
AspectDetail
Data touchedConversation transcripts as input, eval results as JSON output
Data NOT touchedNo telemetry, no analytics, no network calls, no credentials
PermissionsRead: input data. Write: JSON report to configured output path, stdout/stderr
NetworkNone — fully offline evaluation
TelemetryNone collected or sent