Skip to content

Reference

All commands are available via the VS Code Command Palette (Ctrl+Shift+P).

CommandDescription
RunForge: Train (Standard)Run training with the std-train preset
RunForge: Train (High Quality)Run training with the hq-train preset
RunForge: Open RunsView completed training runs
RunForge: Inspect DatasetValidate a dataset before training
RunForge: Open Latest Run MetadataView metadata for the most recent run
RunForge: Inspect Model ArtifactView pipeline structure of a trained model.pkl
RunForge: Browse RunsBrowse all runs with quick actions (summary, diagnostics, artifact inspection)
RunForge: View Latest MetricsView detailed metrics from metrics.v1.json
RunForge: View Latest Feature ImportanceView feature importance for RandomForest models
RunForge: View Latest Linear CoefficientsView coefficients for linear models
RunForge: View Latest Interpretability IndexView unified index of all interpretability artifacts
RunForge: Export Latest Run as MarkdownSave a formatted markdown summary of the latest run
RunForge: Cancel Active TrainingCancel the currently in-progress training run (5s SIGTERM grace, then SIGKILL)
RunForge: Recover IndexWalk .ml/runs/ and re-append missing entries to index.json; returns a structured RecoveryReport

For the full lifecycle behaviour — state machine, source-of-truth detector, partial artifacts, and the recovery report shape — see Cancel and Recovery.

Configure RunForge through VS Code settings (Ctrl+,).

SettingDefaultDescription
runforge.pythonPathpythonPath to the Python executable
runforge.mlRunnerModuleml_runnerPython module to run for training
runforge.modelFamilylogistic_regressionModel family: logistic_regression, random_forest, or linear_svc
runforge.profile(empty)Training profile: default, fast, or thorough
ModelCLI ValueDescriptionInterpretability
Logistic Regressionlogistic_regressionDefault, fast, interpretableLinear coefficients
Random Forestrandom_forestEnsemble, handles non-linear patternsFeature importance (Gini)
Linear SVClinear_svcSupport vector classifier, margin-basedLinear coefficients

All models use StandardScaler as the first pipeline step. Preprocessing is embedded in the trained artifact.

Profiles provide pre-configured hyperparameter overrides.

ProfileDescriptionModel Family
defaultNo hyperparameter overrides(uses setting)
fastReduced iterations for quick runslogistic_regression
thoroughMore trees/iterations for better qualityrandom_forest
  1. CLI --param (highest priority)
  2. Profile-expanded parameters
  3. Model defaults (lowest priority)

Logistic Regression: C (float), max_iter (int), solver (string), warm_start (bool)

Random Forest: n_estimators (int), max_depth (int or None), min_samples_split (int, >= 2), min_samples_leaf (int)

Linear SVC: C (float), max_iter (int)

dataset.csv
|
v
Validate (label column, numeric values)
|
v
Fingerprint (SHA-256 of dataset)
|
v
Split (80/20, deterministic, stratified)
|
v
Fit pipeline (StandardScaler + model)
|
v
Compute metrics
|
v
Extract interpretability
|
v
.ml/runs/<run-id>/

Every step is deterministic given the same seed, dataset, and configuration.

All run artifacts are saved under .ml/runs/<run-id>/:

FileContents
run.jsonMetadata: run ID, timestamp, dataset fingerprint, label column, model family, profile, git SHA, Python path, extension version
metrics.jsonPhase 2 metrics: accuracy, num_samples, num_features
metrics.v1.jsonDetailed metrics by profile (accuracy, precision, recall, F1, confusion matrix, ROC-AUC, log loss)
artifacts/model.pklTrained scikit-learn pipeline (StandardScaler + classifier)
artifacts/feature_importance.v1.jsonGini importance scores (RandomForest only)
artifacts/linear_coefficients.v1.jsonModel coefficients in standardized feature space (LogisticRegression, LinearSVC)
artifacts/interpretability.index.v1.jsonUnified index linking all interpretability outputs

Metrics profile is auto-selected based on model capabilities:

ProfileTriggerMetrics
classification.base.v1All classifiersaccuracy, precision, recall, F1, confusion matrix
classification.proba.v1Binary + predict_probabase + ROC-AUC, log loss
classification.multiclass.v13+ classesbase + per-class precision/recall/F1

RandomForest runs extract Gini importance scores and save them as feature_importance.v1.json. Features are ranked by importance and listed in both importance order and original column order.

No approximations — if the model doesn’t support native importance, no artifact is emitted.

LogisticRegression and LinearSVC runs extract model coefficients into linear_coefficients.v1.json. All coefficients are in standardized feature space (after StandardScaler), so comparing coefficients across features is meaningful.

For multiclass classification (3+ classes), coefficients are grouped per class with deterministic ordering.

Every run produces interpretability.index.v1.json, which lists all available interpretability artifacts, their schema versions, and paths. Absent artifacts are omitted (not set to null).

Structured diagnostics explain run behavior as machine-readable codes:

CodeDescription
MISSING_VALUES_DROPPEDRows dropped due to missing values
LABEL_NOT_FOUNDLabel column not present in dataset
LABEL_TYPE_INVALIDLabel column has invalid type
ZERO_ROWSDataset has zero rows after processing
ZERO_FEATURESDataset has no feature columns
LABEL_ONLY_DATASETDataset contains only the label column
FEATURE_IMPORTANCE_UNSUPPORTED_MODELModel doesn’t support native feature importance
LINEAR_COEFFICIENTS_UNSUPPORTED_MODELModel doesn’t support coefficient extraction

Phase 4 ships a structured event stream (events.schema.v1.json) that Python emits as JSONL on stderr, one event per line. The schema is frozen at v1.0.0 and covers nine event types:

EventWhen emitted
run_startRun begins
dataset_loadedCSV parsed and validated
train_startedPipeline fit begins
train_progressPer epoch (Phase 4 cardinality)
train_finishedPipeline fit complete
metrics_computedValidation metrics computed
artifacts_writtenAll artifacts flushed to disk
cancellingSIGTERM received; emits per-second seconds_remaining countdown
run_cancelledGraceful cancel cleanup complete

Event emission order is deterministic across re-runs of the same configuration; timestamps naturally vary. Non-JSONL stderr is treated as free-form log lines. The TS Bridge validates each event against the schema and drops malformed events without throwing.

Python subprocess spawn (training, version check, GPU probe, dataset inspect, artifact inspect) is gated by VS Code workspace trust. RunForge runs user-controlled Python from a workspace-settable path; the trust guard prevents an untrusted workspace from inducing arbitrary Python execution. Untrusted workspaces receive a structured error pointing to the Manage Workspace Trust UI.

Data touched: workspace CSV files (read-only for training), .ml/ directory (run metadata, model artifacts, metrics). Python subprocess stdout/stderr.

Data NOT touched: no files outside the open workspace, no browser data, no OS credentials.

Permissions: filesystem read/write within workspace only, Python subprocess execution (gated by workspace trust).

No network egress. All operations are local. No telemetry is collected or sent.

For the full trust model, see TRUST_MODEL.md.