Architecture
db-cluster is a federated database cluster. Not a single database with plugins. Not a vector store with metadata. Not an AI wrapper.
The thesis
Section titled “The thesis”An AI system should not query one flattened database. It should operate over a cluster of specialized truth stores, where each store preserves its native truth shape and the cluster exposes one coherent retrieval, provenance, and mutation surface.
Four stores, four owners
Section titled “Four stores, four owners”| Store | Owns | Shape | Derivative? |
|---|---|---|---|
| Canonical | Entities — stable IDs, structured state | {id, kind, name, attributes, owner: 'canonical'} | No — owner truth |
| Artifact | Raw files — documents, source text, uploads | {id, filename, contentHash, mimeType, owner: 'artifact'} | No — owner truth |
| Index | Discoverability — full-text, metadata search | {id, sourceId, sourceStore, text, metadata, owner: 'index'} | Yes — rebuildable from canonical + artifact |
| Ledger | History — provenance events, mutation receipts | {id, action, actorId, subjectId, timestamp, owner: 'ledger'} | No — owner truth |
Key invariant — index is rebuildable
Section titled “Key invariant — index is rebuildable”The index store can be destroyed and rebuilt from canonical + artifact truth without losing any cluster state. It is the only derivative store. db-cluster rebuild index produces an identical index from owned stores; db-cluster verify confirms the rebuild is loss-free.
This is the load-bearing law. Indexes can lie — they may stale, they may be wrong about a name, they may miss a record. But canonical + artifact + ledger can rebuild any index from scratch.
The kernel
Section titled “The kernel”The kernel routes operations to the correct store. It never holds truth itself.
┌──────────────────────────────────────────────────┐│ ClusterKernel ││ ││ find → index → resolve → canonical/artifact ││ retrieve → index + canonical + artifact ││ trace → ledger ││ mutate → command lifecycle → canonical/artifact ││ receipt → ledger │└──────────────────────────────────────────────────┘The kernel enforces:
- Retrieval always resolves to owner truth — index records are never returned as final answers.
- Mutations always cross a command boundary — no direct store writes.
- Provenance is always emitted — every write produces a ledger event.
- Receipts prove operations — every committed mutation gets a receipt.
Cluster URIs
Section titled “Cluster URIs”Every object in the cluster has a URI:
cluster://canonical/<entity-id>cluster://artifact/<artifact-id>cluster://index/<record-id>cluster://ledger/<event-id>cluster://receipt/<receipt-id>URIs identify the owner store. Resolving a URI always returns owner truth — never an index projection.
What db-cluster is NOT
Section titled “What db-cluster is NOT”| Misreading | Reality |
|---|---|
| RAG pipeline | Retrieval resolves to owner truth, not vector similarity. |
| AI memory layer | Entities have structured state, not conversation history. |
| SQL assistant | Mutations require typed commands, not natural language. |
| Vector database | Index is derivative; the cluster owns structured truth. |
| Governance middleware | Policy and provenance are native, not bolted on. |
Physical backends
Section titled “Physical backends”Stores have logical contracts and physical backends:
- The canonical store contract is the same whether backed by local JSON or Postgres.
- Physical backends implement store law — they do not become the product center.
- Backend choice is invisible to the kernel, SDK, MCP, and CLI.
Currently supported:
- Local (JSON files) — all four stores.
- Postgres — canonical store only (artifact / index / ledger remain local).
The Postgres adapter attaches a pool.on('error', …) handler (so an idle-client RST doesn’t crash the process) and uses INSERT … ON CONFLICT to close the TOCTOU window on concurrent imports. It is also append-a-version (parity with the local store): each entity id owns one immutable row per version. SSL/TLS is not configured by db-cluster in v1.0.0 — the connection is plaintext unless your connection string enforces TLS (e.g. sslmode=require, which the pg driver honours), a TLS proxy, or a private network. Driver-managed ssl config is planned for a future release.
Layers
Section titled “Layers”CLI / SDK / MCP ← surfaces (operator, developer, AI-agent) │PolicyEnforcedKernel ← policy + redaction (the root's createSafeCluster handle) │ ClusterKernel ← routing, retrieval, mutation lifecycle │┌─────┼─────┬─────────┐│ │ │ │Canonical Artifact Index Ledger ← stores(Postgres (local) (local) (local) or local)db-cluster is policy-enforced by default. The package root factory createSafeCluster() returns a handle whose only door to cluster truth is a PolicyEnforcedKernel — there is no policy-bypass code path through the default public surface, and the raw ClusterKernel class is not exported. Raw, unpoliced stores are reachable only via the explicit @mcptoolshop/db-cluster/unsafe escape hatch (operator tooling and tests), which deliberately bypasses policy/receipts/provenance.
Cross-cutting concerns
Section titled “Cross-cutting concerns”- Mutation lifecycle — propose → validate → approve → commit → (compensate). See SDK reference.
- Provenance graph —
kernel.traceObject(uri)returns aProvenanceGraphwith nodes (entity, artifact, index_record, provenance_event, receipt, command, evidence_bundle) and edges (11 variants). - Policy —
Principal+Capability+Policy+TrustZone+VisibilityRule. See Policy & Redaction. - Redaction — applied at every read path through
PolicyEnforcedKernel. Marker types are explicit; the redactor uses an allowlist, not a denylist. - Typed errors —
ClusterErrorbase + per-class subclasses, each withcode/remediationHint/retryable. CLI exit codes map sysexits.h.
Further reading
Section titled “Further reading”- Operations — doctor, verify, rebuild, backup, restore, and the per-error runbooks.
- Policy & Redaction — the full policy model.
- MCP Integration — how AI agents consume the cluster.