Operations
This page covers the operational surface — health checks, integrity proofs, rebuild from truth, and backup / restore. All operations are read-only by default; destructive operations gate behind --yes plus an interactive TTY confirmation.
Health: doctor
Section titled “Health: doctor”npx db-cluster doctor [--json]doctor performs a full reachability assessment across the four stores:
- Canonical store reachable, schema present.
- Artifact store directory writable, hash file readable.
- Index store reachable, count consistent.
- Ledger store reachable, last event timestamp readable.
- Postgres connection (if
DB_CLUSTER_POSTGRES_URLset) — pool acquirable, migration status. - Policy file loadable (if
DB_CLUSTER_POLICIES_FILEset).
Output is a list of HealthCheck objects (status ∈ healthy | degraded | unverified | missing | stale | unreachable | corrupt), with the cluster-level worst-of severity ordering applied.
doctor never mutates state. It runs in seconds even on large clusters.
Integrity: verify
Section titled “Integrity: verify”npx db-cluster verify [--json] [--sample <N>]verify proves data consistency invariants:
- Index → source — every index record points to a real canonical or artifact ID.
- Provenance → subject — every ledger event references a real subject.
- Receipt → event — every receipt chains to a real provenance event.
Mismatches are reported with the offending IDs and a remediation hint. verify is more expensive than doctor (full walks of the index + ledger). Use --sample <N> for a fast probe on very large clusters.
Rebuild from truth: rebuild index
Section titled “Rebuild from truth: rebuild index”npx db-cluster rebuild index [--dry-run] [--yes]Reconstructs the index store from canonical + artifact owner truth. The index is derivative — losing it loses no cluster state.
This is destructive (clears the existing index) and gates behind --yes + an interactive TTY confirmation. Pre-mutation snapshot is captured automatically; the undo hint points at the snapshot path.
The IndexStore.replaceAll() contract method makes the rebuild atomic — a crash mid-rebuild cannot leave the index empty.
Detect stale records: rebuild check
Section titled “Detect stale records: rebuild check”npx db-cluster rebuild check [--json]Detects:
- Orphan index records — index points to a canonical / artifact ID that no longer exists.
- Missing index entries — canonical / artifact rows that the index doesn’t reflect.
Read-only. Pair with rebuild index to repair.
Backup: backup
Section titled “Backup: backup”npx db-cluster backup [-o <file>] [--force-overwrite] [--yes]Exports cluster state as portable JSON: entities, artifacts (content base64-encoded + SHA-256 checksum), events, receipts. Backup is content-complete — restore reconstructs the cluster including raw artifact bytes.
--force-overwrite (gated by --yes) replaces an existing backup file. Without it, BackupTargetExistsError is raised with a hint pointing to a freshly-timestamped filename.
Restore: restore
Section titled “Restore: restore”npx db-cluster restore <file> [--yes]Imports cluster state from a backup file. Restore is additive — duplicate restores don’t corrupt state (INSERT … ON CONFLICT semantics on the Postgres adapter; idempotent importSnapshot / importEvent / importReceipt on local adapters).
Artifact content is verified against the recorded SHA-256 checksum before write. After import, the index is rebuilt; restore exits non-zero if the index rebuild propagates an error (the RestoreResult.index field surfaces the rebuild outcome).
Compensate a committed mutation: compensate
Section titled “Compensate a committed mutation: compensate”npx db-cluster compensate <command-id> --reason "..." [--yes]Compensation creates a new compensating command that reverses the effect of a committed one. The original receipt is preserved — nothing is erased; the audit trail shows both the original and the compensation.
You cannot compensate non-committed commands.
Postgres status: migration-status / verify-schema
Section titled “Postgres status: migration-status / verify-schema”npx db-cluster migration-status [--json]npx db-cluster verify-schema [--json]migration-status reports whether the Postgres canonical schema is current. verify-schema validates column structure matches the contract. Both work against a live Postgres pool.
Runbooks — one per typed-error class
Section titled “Runbooks — one per typed-error class”The docs/runbooks/ directory in the repo carries one runbook per recurring failure class. Each follows the same Symptom / Cause / Verify / Recover / Escalate shape:
- corrupt-store —
CommandQueueCorruptError,CorruptStoreError. JSON parse failures, truncated writes. - orphan-mutations —
mutation_orphanedledger events. Receipt-failed commands whose state needs reconciliation. - index-stale — index records older than canonical / artifact truth. Use
rebuild checkthenrebuild index. - postgres-unreachable — pool connect failures. Network, credential, and connection-string TLS (
sslmode) checks.
Read these before paging anyone.
Logging
Section titled “Logging”Every CLI invocation supports:
--quiet— suppress STDOUT non-error output entirely (errors still emit to STDERR).--log-level <level>— gate STDERR-bound info / warn / debug output.'error'is quietest,'debug'is noisiest. Default:'info'.--no-color— disable ANSI color codes (theNO_COLORenv var is also honoured per https://no-color.org).
For pipe-friendly output: npx db-cluster doctor --json --quiet | jq produces clean JSON with no stderr noise mixed in.
Exit codes
Section titled “Exit codes”db-cluster uses the sysexits.h convention. See CLI Reference for the full table.