Skip to content

Style Dataset Lab Handbook

Style Dataset Lab turns approved visual work into versioned, review-backed datasets, splits, export packages, and eval packs — then puts trained models to work in a full production workflow and feeds the best outputs back into the corpus. Teams define what they are making (visual canon), curate outputs against project-specific rubrics, bind accepted work to constitution rules, produce reproducible dataset packages, and run a closed production loop through brief compilation, generation, selection, and re-ingest.

The pipeline ships as an npm package (@mcptoolshop/style-dataset-lab) with the sdlab CLI, shared library modules, and domain-specific starter templates.

canon → generate → curate → bind → snapshot → split → export → eval
| | | | | | | |
rules ComfyUI judgment rules frozen subject package verify
selection isolation
brief → run → critique → batch → select → re-ingest
↑ |
└────────────────────────────────────┘

Each stage writes structured JSON records to projects/<name>/records/. All commands accept --project <name> (defaults to star-freight). Records accumulate provenance, judgment, and canon binding over time. Nothing is lost — a curated record still carries its original generation provenance.

The dataset spine produces four artifacts. These are the product.

ArtifactWhat it isCommand
SnapshotFrozen, fingerprinted selection of eligible records with explicit reason tracessdlab snapshot create
SplitSubject-isolated, lane-balanced train/val/test partition (seeded PRNG, zero leakage)sdlab split build
Export packageSelf-contained dataset: manifest, metadata.jsonl, images, splits, dataset card, checksumssdlab export build
Eval packCanon-aware test instruments: lane coverage, forbidden drift, anchor/gold, subject continuitysdlab eval-pack build

sdlab defines and owns the dataset. Downstream format conversion (TRL, LLaVA, Parquet) is handled by repo-dataset.

Every record in the dataset carries three things:

  1. Provenance — generated by ComfyUI with full history (checkpoint, LoRA, seed, sampler, cfg scale, resolution, timing).
  2. Canon binding — which constitution rules this asset passes, fails, or partially meets, with rationale strings.
  3. Quality judgment — approved, rejected, or borderline, with per-dimension scores and cited failure modes.

This triple — provenance, canon, judgment — is what makes the dataset useful beyond simple image labeling. A model trained on images alone learns to generate. A model trained on images with grounded judgment learns to evaluate.

The pipeline is domain-agnostic. Five starter templates ship with real production worldviews:

DomainFocusExample lanes
game-artGame concept art with faction systemscharacter, environment, prop, ship, interior
character-designCharacter production for film/animation/gamesportrait, full_body, turnaround, expression_sheet
creature-designCreature lineup and species developmentconcept, orthographic, detail_study, action, habitat
architectureArchitectural worldbuilding and pre-vizexterior, interior, streetscape, ruin, landscape
vehicle-mechVehicle and mech designexterior, cockpit, component, schematic, damage_variant

Each template includes constitution rules, lane definitions, scoring rubrics, and group vocabulary designed for that production context.

Each project lives in projects/<name>/ with 5 JSON config files alongside its data:

  • project.json — identity, domain, generation defaults
  • constitution.json — rules array with rationale templates
  • lanes.json — subject lanes with regex detection patterns
  • rubric.json — scoring dimensions, thresholds, failure-to-rule mappings
  • terminology.json — group vocabulary with detection order

Use sdlab project doctor to validate any project’s config completeness.

The repo includes projects/star-freight/ as a complete working example — a gritty sci-fi RPG with 5 factions, 1,182 curated records, 28 prompt waves, 7 lanes, and 24 constitution rules. Clone the repo to explore it.