Hedging and Safety
The most dangerous failure mode in a memory system is premature canonization — storing a passing remark as a settled decision. “Maybe we should use Redis?” becomes “Decision: use Redis.” The agent then acts on a choice that was never made.
DeltaMind’s safety model exists to prevent this.
Zero-canonization goal
The canonization rate measures how often speculative language gets promoted to decision or fact status. DeltaMind targets 0% canonization across all fixture types, including pathological transcripts designed to trick the extractor.
Current results: 0% canonization across all models (gemma2:9b, phi4:14b, qwen2.5:14b) and all fixture types (clean, messy, pathological, revision-pack).
How hedging detection works
The rule-based extractor checks for hedging language before promoting anything to decision status:
- “maybe”, “perhaps”, “possibly”, “might”, “could”
- Question marks in decision-adjacent text
- “considering”, “thinking about”, “exploring”
- “not sure”, “unclear”, “depends on”
When hedging is detected, the candidate is downgraded from decision_made to hypothesis_introduced. The hypothesis gets tentative status and is excluded from memory promotion.
LLM safety rails
The LLM extractor prompt includes explicit abstain permission:
If a turn discusses possibilities without settling on a choice, emit hypothesis_introduced, not decision_made. If you are unsure whether something is a decision, abstain.
The LLM pipeline also runs a safety downgrade pass after extraction:
- Candidates with
decision_revisedtargeting a constraint are reclassified or dropped - Candidates with mismatched target kinds are dropped before they reach the reconciler
- Low-confidence revision candidates are filtered
Model policy
Not all models are safe for decision extraction. DeltaMind maintains an explicit model policy:
- Default: gemma2:9b — best precision/recall balance, zero canonization
- Allowed: qwen2.5:7b, qwen2.5:14b, phi4:14b, gemma2:9b
- Blocked: llama3.1:8b — 14.3% canonization on pathological fixture (promoted hedged “Use Redis” to decision_made)
The block is based on empirical evidence from the model sweep. A model that canonizes in controlled testing will canonize in production.
Advisory boundary
Memory promotion suggestions have a hard boundary:
- Included: Decisions (active, high confidence), constraints, facts, goals
- Excluded: Hypotheses (always), branch-tagged items (always), tentative items (always), low-confidence items (by default)
This is the advisory/authoritative split. Session state is authoritative — DeltaMind updates it automatically. Durable memory is advisory — DeltaMind suggests, humans or policy decide.
The reasoning: a session-local false positive gets corrected when the session ends. A durable false positive persists across sessions and corrupts future context. The cost asymmetry demands different authority levels.
Pathological testing
The pathological fixture is specifically designed to trigger canonization failures:
- Hedged language that sounds like decisions (“Maybe Redis would be good for this”)
- Implicit preferences stated as questions (“What about using Fastify?”)
- Conditional plans (“If the timeline allows, we could add caching”)
- Reversed decisions stated ambiguously
Any model or extractor that canonizes on the pathological fixture is disqualified from default use.
The principle
Better to abstain than poison the ledger.
A missed decision can be caught later — the conversation will reference it again if it matters. A false decision, once in state, biases every subsequent export, every suggestion, every query answer. The asymmetry is clear: abstention is recoverable, canonization is persistent damage.