Beginners Guide
This guide walks you through implementing prov-spec from scratch. By the end you will have a working provenance record, validated it against the spec, and declared conformance.
1. What problem does prov-spec solve?
Section titled “1. What problem does prov-spec solve?”When tools process data, consumers need to answer three questions:
- What happened? Which operations were applied to produce the output?
- Can I verify it? Is the output’s integrity independently checkable?
- Where did it come from? Can I trace the output back through a chain of inputs?
prov-spec answers these by defining a standard vocabulary of method IDs (what happened), a provenance record format (structured JSON), and conformance levels (how much you implement). Any language, any tool — the spec is protocol-neutral.
2. Key concepts
Section titled “2. Key concepts”Before writing code, understand these five building blocks:
| Concept | What it is | Where it lives |
|---|---|---|
| Method ID | A stable, namespaced string identifying a processing step (e.g., integrity.digest.sha256) | prov.record.methods[] |
| Provenance Record | A JSON object documenting a tool invocation — inputs, outputs, methods applied, and evidence | Schema: prov.record.v0.1 |
| Artifact | A reference to an input or output with optional digest for integrity checking | Schema: artifact.v0.1 |
| Envelope | An optional wrapper that pairs a tool’s result with its provenance record | Schema: mcp.envelope.v0.1 |
| Conformance Level | A testable tier (L1 Integrity, L2 Engine, L3 Lineage) declaring what your implementation supports | Declared in prov-capabilities.json |
Method IDs follow a strict grammar: lowercase, dot-separated, with an optional version suffix. For example, adapter.wrap.envelope_v0_1 belongs to the adapter namespace and is pinned to envelope version 0.1.
3. Your first provenance record
Section titled “3. Your first provenance record”Create a file called my-record.json:
{ "schema_version": "prov.record.v0.1", "run_id": "demo-run-001", "tool": { "name": "my-tool", "version": "1.0.0" }, "inputs": [ { "schema_version": "artifact.v0.1", "artifact_id": "input-doc", "media_type": "application/json" } ], "outputs": [ { "schema_version": "artifact.v0.1", "artifact_id": "output-doc", "media_type": "application/json", "digest": { "alg": "sha256", "value": "e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855" } } ], "methods": [ "integrity.digest.sha256", "engine.prov.artifact.register_output" ], "evidence": [], "parents": []}Key points:
run_iduniquely identifies this invocation.methods[]lists only the method IDs that were actually applied.digestuses lowercase hexadecimal and explicitly states the algorithm.evidenceandparentsare empty arrays when not applicable.
4. Validating your record
Section titled “4. Validating your record”Clone the repo and use the reference validator:
git clone https://github.com/mcp-tool-shop-org/prov-speccd prov-spec
# Check method ID syntax and catalog membershippython tools/python/prov_validator.py validate-methods my-record.json --strict
# Run the SHA-256 test vector to confirm your canonicalization matchespython tools/python/prov_validator.py check-vector integrity.digest.sha256
# Run all built-in checkspython tools/python/prov_validator.py self-testThe validator checks three things:
- Syntax — each method ID matches the grammar (
^[a-z][a-z0-9_]*(\.[a-z][a-z0-9_]*)*(_v[0-9]+_[0-9]+)?$). - Namespace — each method belongs to a known namespace (
adapter,engine,integrity, orlineage), not a reserved one. - Catalog membership (strict mode) — each method ID exists in
spec/methods.json.
5. Declaring conformance
Section titled “5. Declaring conformance”Once your implementation satisfies a conformance level, ship a prov-capabilities.json in your project root:
{ "schema": "prov-capabilities@v0.1", "engine": { "name": "my-tool", "version": "1.0.0" }, "implements": [ "integrity.digest.sha256" ], "conformance_level": "L1-integrity"}The three levels build on each other:
| Level | What you need | What it proves |
|---|---|---|
| L1 — Integrity | At least one integrity.digest.* method with correct canonicalization | Outputs have verifiable cryptographic hashes |
| L2 — Engine | L1 + adapter.wrap.envelope_v0_1 + adapter.provenance.attach_record_v0_1 + engine.prov.artifact.register_output | You can construct valid provenance records inside envelopes |
| L3 — Lineage | L2 + lineage.parent.link | You can chain provenance records into a directed acyclic graph |
Start with L1. Move to L2 when you wrap outputs in envelopes. Add L3 when you link records across tool invocations.
Validate your manifest:
python tools/python/prov_validator.py validate-manifest prov-capabilities.json6. Wrapping output in an envelope
Section titled “6. Wrapping output in an envelope”To reach L2 conformance, wrap your tool output in an mcp.envelope.v0.1:
{ "schema_version": "mcp.envelope.v0.1", "result": { "status": "ok", "data": "your tool output here" }, "provenance": { "schema_version": "prov.record.v0.1", "run_id": "demo-run-002", "tool": { "name": "my-tool", "version": "1.0.0", "adapter": "custom" }, "inputs": [ { "schema_version": "artifact.v0.1", "artifact_id": "request-01", "media_type": "application/json" } ], "outputs": [ { "schema_version": "artifact.v0.1", "artifact_id": "response-01", "media_type": "application/json", "digest": { "alg": "sha256", "value": "abc123def456..." } } ], "methods": [ "adapter.wrap.envelope_v0_1", "adapter.provenance.attach_record_v0_1", "engine.prov.artifact.register_output", "integrity.digest.sha256" ], "evidence": [], "parents": [] }}Rules to remember:
- Never double-wrap. If your tool already returns an envelope, pass it through unchanged and claim
adapter.pass_through.envelope_v0_1instead. - Only claim methods you actually applied. Claiming
integrity.digest.sha256without computing a real digest is a conformance violation. - Set
provenancetonullif you cannot reliably compute artifact digests.
7. Common mistakes and how to avoid them
Section titled “7. Common mistakes and how to avoid them”| Mistake | Why it fails | Fix |
|---|---|---|
| Uppercase in method IDs | Grammar requires %x61-7A (lowercase a-z only) | Use integrity.digest.sha256, never Integrity.Digest.SHA256 |
| Hyphens instead of dots | Grammar uses dots as separators | Use adapter.wrap, never adapter-wrap |
| Claiming methods not applied | Semantic contract violation — auditors will flag it | Only list methods in methods[] that were actually executed |
| Non-deterministic digests | Same input must always produce the same hash | Canonicalize JSON before hashing: sorted keys, compact separators, UTF-8 |
| Double-wrapping envelopes | Spec forbids nesting envelopes | Check if the payload already has schema_version: "mcp.envelope.v0.1" before wrapping |
| Uppercase hex in digests | Spec requires lowercase hexadecimal | Always output abcdef, never ABCDEF |
| Using reserved namespaces | policy.*, attestation.*, execution.*, audit.* are reserved | Stick to adapter, engine, integrity, lineage |
Empty methods[] array | A record with no claimed methods has no provenance value | Always claim at least one method that was applied |
For canonicalization, use this pattern in any language:
import json
def canonical_json(obj): return json.dumps(obj, sort_keys=True, separators=(',', ':'), ensure_ascii=False)The canonical form is then hashed as UTF-8 bytes. The test vector at spec/vectors/integrity.digest.sha256/ provides a reference input/output pair you can validate against.