Architecture

Pipeline flow

Style Dataset Lab has one unified pipeline that runs from canon definition through a frozen dataset, then loops through production briefs, critique, batch work, selection, and re-ingest. The same flow is walked narratively in End-to-End Production Loop.

 canon → generate → curate → bind → snapshot → split → export → eval
   ↑                                                              │
   │                                                              v
   │     brief ← workflow                                    training
   │       │                                                      │
   │       v                                                      v
   │     run → critique → refine                              eval-run
   │       │                                                      │
   │       v                                                      │
   │     batch → select → re-ingest ──────────────────────────────┘
   └────────────────────────────────────────────────────────────────

Each stage produces structured artifacts with IDs that link back to their predecessors. Selected outputs from the production leg return as candidate records through sdlab reingest selected — they go through the same curate + bind review as every other record, never auto-approved.

Legacy scripts

painterly (img2img style pass) and compare (A-vs-B preference capture) remain in the CLI surface but are not part of the core loop. painterly is a post-processing convenience. compare feeds DPO/ORPO pair data for preference training and is only needed when you are building preference datasets.

Repository layout

style-dataset-lab/
  bin/sdlab.js                CLI entry point (single binary)
  lib/                        Shared modules (paths, errors, logging, schemas)
  scripts/                    Per-command scripts (each exports run(argv))
  schemas/                    JSON schemas for records, snapshots, splits, manifests
  templates/                  Starter content (ships with the npm package)
    canon/                    Starter constitution.md and review-rubric.md
    inputs/prompts/           Example prompt pack (example-wave.json)
    domains/                  Five domain starters (per-domain project + config + workflows)
      game-art/
      character-design/
      creature-design/
      architecture/
      vehicle-mech/
  projects/                   Project data (repo-only, not in npm)
    star-freight/             Star Freight example (canon, records, outputs, exports)
    <your-project>/           Scaffolded with `sdlab init <name>`
  workflows/                  Reserved; see workflows/README.md
  site/                       Documentation site (Starlight)
  package.json

The npm package ships bin/, lib/, scripts/, and templates/ only. Project data stays in the repo. Each project is fully isolated — its own canon, records, and assets. All sdlab commands read from and write to projects/<name>/ based on the --project flag (default: star-freight).

Template layout

Each domain template under templates/domains/<domain>/ ships a complete starting point:

File	Purpose
`project.json`	Project metadata + generation defaults (checkpoint, LoRAs, resolution, sampler)
`constitution.json`	Machine-readable rules that canon-bind maps scores and failure modes to
`lanes.json`	Subject lane definitions (portrait, full-body, streetscape, etc.)
`rubric.json`	Scoring dimensions and approval thresholds
`terminology.json`	Group/faction labels, detection patterns, and training profiles
`workflows/profiles/*.json`	ComfyUI workflow profiles copied into the project on `sdlab init`

The shared canon markdown (templates/canon/constitution.md, templates/canon/review-rubric.md) is copied into every new project’s canon/ directory as a writable starting point.

To scaffold a new project:

sdlab init my-project --domain character-design

This creates projects/my-project/ with the full directory structure, config files pulled from the domain template, canon markdown stubs, and an example prompt pack ready to edit.

Record schema

Every asset in the dataset is represented by a JSON file in projects/<name>/records/. Records accumulate data over time as the asset moves through the pipeline.

Standard record (curated + canon-bound)

{
  "id": "wave3_compact_officer_bridge_s42",
  "asset_path": "outputs/approved/wave3_compact_officer_bridge_s42.png",
  "provenance": {
    "checkpoint": "dreamshaperXL_v21TurboDPMSDE.safetensors",
    "loras": [{ "name": "classipeintxl_v21.safetensors", "weight": 1.0 }],
    "prompt": "concept art of a Compact military officer...",
    "negative_prompt": "photorealistic, photograph...",
    "seed": 42,
    "steps": 8,
    "cfg": 2.0,
    "sampler": "dpmpp_sde",
    "scheduler": "karras",
    "width": 1024,
    "height": 1024,
    "generated_at": "2026-03-15T10:30:00.000Z"
  },
  "judgment": {
    "status": "approved",
    "reviewer": "human:mike",
    "reviewed_at": "2026-03-15T11:00:00.000Z",
    "explanation": "Clean silhouette, correct palette, good material read",
    "criteria_scores": {
      "silhouette_clarity": 0.9,
      "palette_adherence": 0.85,
      "material_fidelity": 0.8,
      "faction_read": 0.9,
      "wear_level": 0.75,
      "style_consistency": 0.85,
      "clothing_logic": 0.8,
      "composition": 0.9
    },
    "failure_modes": [],
    "improvement_notes": null,
    "confidence": 0.9
  },
  "canon": {
    "assertions": [
      { "rule": "RND-001", "verdict": "pass", "rationale": "Painterly style, visible texture" },
      { "rule": "MAT-001", "verdict": "pass", "rationale": "Surfaces read as pressed/stamped" },
      { "rule": "COL-001", "verdict": "pass", "rationale": "Steel blue dominant, charcoal secondary" }
    ]
  }
}

Identity record (named subjects)

Identity records extend the standard schema with two additional blocks:

{
  "identity": {
    "subject_name": "renna_vasik",
    "subject_type": "named_character",
    "faction": "reach",
    "role": "crew",
    "view_type": "anchor_portrait",
    "shot_type": "portrait",
    "identity_anchor": true,
    "location_name": null,
    "ship_name": null,
    "scene_function": "establish face and costume truth"
  },
  "lineage": {
    "generation_phase": "discovery",
    "anchor_source_image": null,
    "anchor_subject_version": null,
    "identity_persistence_score": null,
    "derived_from_record_id": null
  }
}

The identity block tracks who or what the image depicts. The lineage block tracks how it was generated — discovery (txt2img from scratch), anchor (promoted discovery), or follow-on (img2img derived from an anchor).

Follow-on records must reference their anchor via anchor_source_image and derived_from_record_id. sdlab generate:identity enforces this with a hard validation failure.

Canon constitution

The style constitution lives in projects/<name>/canon/constitution.md. It defines every rule that judgments can cite. The machine-readable mirror in projects/<name>/constitution.json is what sdlab bind reads to emit canon.assertions.

Rule categories

Category	Prefix	Scope
Rendering	`RND-*`	Universal — applies to all images
Material	`MAT-*`	Faction-specific material vocabulary
Shape	`SHP-*`	Faction-specific shape language
Color	`COL-*`	Faction-specific palette and saturation
Clothing	`CLO-*`	Faction-specific construction logic
Ship exterior	`SHP-EXT-*`	Ship hull and exterior design
Ship interior	`SHP-INT-*`	Interior spaces and habitation
Equipment	`EQP-*`	Weapons, tools, props
Environment	`ENV-*`	Scene atmosphere and faction footprint

Scoring dimensions

rubric.json defines the scoring dimensions (0.0 to 1.0) that apply to all image types. Identity records add further subject-specific dimensions.

Approval defaults: all dimensions >= 0.6, average >= 0.7. Rejection defaults: any dimension < 0.4, or average < 0.5.

Thresholds are configurable per project in rubric.json.

Supporting documents

File	Purpose
`projects/<name>/canon/constitution.md`	Full style rules with faction details
`projects/<name>/canon/review-rubric.md`	Quick review protocol and common failure modes
`projects/<name>/canon/identity-gates.md`	Named-subject acceptance criteria and lineage schema
`projects/<name>/canon/species-canon.md`	Alien species anatomy and design specifications (if applicable)

Curation workflow

candidate (uncurated)
    │
    v
sdlab curate --> judgment block written to record
    │            │            │
    v            v            v
approved    rejected    borderline
    │
    v
sdlab bind --> canon.assertions written to record
    │
    v
sdlab compare --> pairwise comparison records (optional, for preference data)
    │
    v
ready for snapshot → split → export

Key design decisions:

Record before move. sdlab curate writes the judgment to the record file before moving the image. This prevents orphaned images if the process crashes mid-operation.
Scores are human-entered. Per-dimension scores come from the curator, not from automated analysis. This is intentional — the dataset trains models to replicate human aesthetic judgment.
Canon binding is deterministic. sdlab bind maps scores and failure modes to constitution rules deterministically. A low material_fidelity score maps to MAT-001. A wrong_palette failure mode maps to COL-001.
Comparisons are separate. Pairwise preferences live in projects/<name>/comparisons/, not inside records. This allows comparing images across different waves and categories.

Export pipeline

The export is handled by the external @mcptoolshop/repo-dataset CLI, which scans a project directory and produces multimodal training data.

What it scans

Point repo-dataset at a project directory: repo-dataset visual generate ./projects/star-freight --format trl

Source	What it reads
`projects/<name>/records/*.json`	Provenance, judgment, canon assertions
`projects/<name>/outputs/approved/*.png`	Approved images (classification + critique)
`projects/<name>/outputs/rejected/*.png`	Rejected images (classification + critique)
`projects/<name>/comparisons/*.json`	Pairwise preferences (DPO/ORPO pairs)

Training unit types

Type	Input	Output	Use case
Classification	Image	approved/rejected label	Train binary quality classifier
Critique	Image	Explanation + scores + failures	Train grounded evaluation
Preference	Image pair + winner	Reasoning for preference	DPO/ORPO alignment training

Supported formats

TRL, LLaVA, Qwen2-VL, Axolotl, LLaMA-Factory, ShareGPT, OpenAI, DPO, ORPO, KTO.

Identity packet system

Identity packets handle named subjects — specific characters, ships, and locations that must be visually recognizable across multiple images.

Generation phases

Discovery — txt2img from composed prompts, 3 seeds per shot. Purpose: find the face, silhouette, and costume truth.
Anchoring — curate discovery outputs, promote the strongest to anchor status. The anchor is the recurring truth source.
Follow-on — img2img from the anchor at low denoise (0.30-0.45). Purpose: prove identity persists across views, lighting, and context.

Acceptance gates

Identity images must pass both the standard constitution gates AND identity-specific gates:

Gate	What it checks
IC-1	Silhouette distinction from other characters in the same faction
IC-2	Faction legibility through character-specific expression
IC-3	Identity persistence across 2+ images of the same character
IC-4	Costume integrity — no drift into other factions
IC-5	Spec compliance with identity lock non-negotiable details
IC-6	No generic drift — must read as this person, not an archetype

Locations and ships have their own parallel gates (IL-1 through IL-5) focused on structural repeatability, material language, and lived-in proof.

Directory structure

projects/<name>/inputs/identity-packets/   Identity packet definitions (JSON)
projects/<name>/records/                   Extended records with identity + lineage blocks
projects/<name>/outputs/candidates/        Discovery outputs (uncurated)
projects/<name>/outputs/approved/          Curated approved (anchors + follow-ons)