Skip to content

Review & Verdicts

When you run taste review run artifact.md, the engine executes this pipeline:

  1. Canon packet assembly — Selects the most relevant canon statements for this artifact
  2. 4-dimension scoring — LLM scores each dimension independently (0-10)
  3. Deterministic verdict synthesis — Rules combine scores into a final verdict
  4. Model synthesis — LLM generates a human-readable explanation
  5. Persistence — Results stored in SQLite for history and calibration

Not every canon statement is relevant to every artifact. The engine uses rule-led retrieval to build a targeted packet:

  1. Hard thesis — Always included (product identity is always relevant)
  2. Anti-patterns — Priority inclusion (violations are high-signal)
  3. Artifact-type match — Statements scoped to this artifact type
  4. Scope match — Statements scoped to this area of the product
  5. Voice/naming — Included when relevant
  6. Tag/lexical match — Keyword overlap between statement and artifact
Terminal window
taste review packet artifact.md # Preview the canon packet without running a full review

Each dimension is scored independently by the LLM using a dimension-specific prompt:

Does the artifact serve the product’s core thesis? A feature brief that builds on the product’s stated purpose scores high. One that pulls the product in an unstated direction scores low.

Does the artifact follow established design patterns? If your canon says “all config goes in a single file,” an artifact proposing per-module configs scores low.

Does the artifact violate any stated anti-patterns? This dimension is inverted — a high score means no collisions. Any collision with a hard anti-pattern floors this score.

Does the artifact match naming conventions and tone? If your canon says “use imperative verbs for commands” and the artifact uses passive voice, this scores low.

The final verdict is deterministic — rules combine the 4 dimension scores, not the LLM. The model cannot talk itself into a pass.

VerdictMeaning
alignedAll dimensions strong. Ship it.
mostly_alignedMinor issues, not blocking. Suggestions offered.
salvageable_driftMeaningful drift, but the goal is sound. Repairable.
hard_driftSignificant misalignment. Structural repair needed.
contradictionDirectly contradicts canon. Cannot ship.

Four calibration rules prevent common scoring artifacts:

  • Salvageability gate — If any dimension shows recoverable motion, verdict cannot be worse than salvageable_drift
  • Category-collapse detector — If all dimensions cluster at the same score, the verdict is suspect and gets re-examined
  • Naming-law separation — Voice/naming issues alone cannot push a verdict past mostly_aligned
  • Relaxed aligned threshold — A single dimension at 7/10 with the rest at 9+ still qualifies as aligned

After reviews, you can provide feedback to tune the system:

Terminal window
taste calibrate feedback <review-id> # Record feedback on a review
taste calibrate summary # View calibration metrics
taste calibrate statements # Per-statement accuracy
taste calibrate findings # Identified calibration issues