Reference
Commands
Section titled “Commands”All commands are available via the VS Code Command Palette (Ctrl+Shift+P).
| Command | Description |
|---|---|
RunForge: Train (Standard) | Run training with the std-train preset |
RunForge: Train (High Quality) | Run training with the hq-train preset |
RunForge: Open Runs | View completed training runs |
RunForge: Inspect Dataset | Validate a dataset before training |
RunForge: Open Latest Run Metadata | View metadata for the most recent run |
RunForge: Inspect Model Artifact | View pipeline structure of a trained model.pkl |
RunForge: Browse Runs | Browse all runs with quick actions (summary, diagnostics, artifact inspection) |
RunForge: View Latest Metrics | View detailed metrics from metrics.v1.json |
RunForge: View Latest Feature Importance | View feature importance for RandomForest models |
RunForge: View Latest Linear Coefficients | View coefficients for linear models |
RunForge: View Latest Interpretability Index | View unified index of all interpretability artifacts |
RunForge: Export Latest Run as Markdown | Save a formatted markdown summary of the latest run |
RunForge: Cancel Active Training | Cancel the currently in-progress training run (5s SIGTERM grace, then SIGKILL) |
RunForge: Recover Index | Walk .ml/runs/ and re-append missing entries to index.json; returns a structured RecoveryReport |
For the full lifecycle behaviour — state machine, source-of-truth detector, partial artifacts, and the recovery report shape — see Cancel and Recovery.
Settings
Section titled “Settings”Configure RunForge through VS Code settings (Ctrl+,).
| Setting | Default | Description |
|---|---|---|
runforge.pythonPath | python | Path to the Python executable |
runforge.mlRunnerModule | ml_runner | Python module to run for training |
runforge.modelFamily | logistic_regression | Model family: logistic_regression, random_forest, or linear_svc |
runforge.profile | (empty) | Training profile: default, fast, or thorough |
Supported Models
Section titled “Supported Models”| Model | CLI Value | Description | Interpretability |
|---|---|---|---|
| Logistic Regression | logistic_regression | Default, fast, interpretable | Linear coefficients |
| Random Forest | random_forest | Ensemble, handles non-linear patterns | Feature importance (Gini) |
| Linear SVC | linear_svc | Support vector classifier, margin-based | Linear coefficients |
All models use StandardScaler as the first pipeline step. Preprocessing is embedded in the trained artifact.
Training Profiles
Section titled “Training Profiles”Profiles provide pre-configured hyperparameter overrides.
| Profile | Description | Model Family |
|---|---|---|
default | No hyperparameter overrides | (uses setting) |
fast | Reduced iterations for quick runs | logistic_regression |
thorough | More trees/iterations for better quality | random_forest |
Hyperparameter Precedence
Section titled “Hyperparameter Precedence”- CLI
--param(highest priority) - Profile-expanded parameters
- Model defaults (lowest priority)
Supported Hyperparameters
Section titled “Supported Hyperparameters”Logistic Regression: C (float), max_iter (int), solver (string), warm_start (bool)
Random Forest: n_estimators (int), max_depth (int or None), min_samples_split (int, >= 2), min_samples_leaf (int)
Linear SVC: C (float), max_iter (int)
Run Lifecycle
Section titled “Run Lifecycle”dataset.csv | v Validate (label column, numeric values) | v Fingerprint (SHA-256 of dataset) | v Split (80/20, deterministic, stratified) | v Fit pipeline (StandardScaler + model) | v Compute metrics | v Extract interpretability | v .ml/runs/<run-id>/Every step is deterministic given the same seed, dataset, and configuration.
Artifacts
Section titled “Artifacts”All run artifacts are saved under .ml/runs/<run-id>/:
| File | Contents |
|---|---|
run.json | Metadata: run ID, timestamp, dataset fingerprint, label column, model family, profile, git SHA, Python path, extension version |
metrics.json | Phase 2 metrics: accuracy, num_samples, num_features |
metrics.v1.json | Detailed metrics by profile (accuracy, precision, recall, F1, confusion matrix, ROC-AUC, log loss) |
artifacts/model.pkl | Trained scikit-learn pipeline (StandardScaler + classifier) |
artifacts/feature_importance.v1.json | Gini importance scores (RandomForest only) |
artifacts/linear_coefficients.v1.json | Model coefficients in standardized feature space (LogisticRegression, LinearSVC) |
artifacts/interpretability.index.v1.json | Unified index linking all interpretability outputs |
Metrics Profiles
Section titled “Metrics Profiles”Metrics profile is auto-selected based on model capabilities:
| Profile | Trigger | Metrics |
|---|---|---|
classification.base.v1 | All classifiers | accuracy, precision, recall, F1, confusion matrix |
classification.proba.v1 | Binary + predict_proba | base + ROC-AUC, log loss |
classification.multiclass.v1 | 3+ classes | base + per-class precision/recall/F1 |
Interpretability
Section titled “Interpretability”Feature Importance (Tree Models)
Section titled “Feature Importance (Tree Models)”RandomForest runs extract Gini importance scores and save them as feature_importance.v1.json. Features are ranked by importance and listed in both importance order and original column order.
No approximations — if the model doesn’t support native importance, no artifact is emitted.
Linear Coefficients
Section titled “Linear Coefficients”LogisticRegression and LinearSVC runs extract model coefficients into linear_coefficients.v1.json. All coefficients are in standardized feature space (after StandardScaler), so comparing coefficients across features is meaningful.
For multiclass classification (3+ classes), coefficients are grouped per class with deterministic ordering.
Unified Index
Section titled “Unified Index”Every run produces interpretability.index.v1.json, which lists all available interpretability artifacts, their schema versions, and paths. Absent artifacts are omitted (not set to null).
Diagnostics
Section titled “Diagnostics”Structured diagnostics explain run behavior as machine-readable codes:
| Code | Description |
|---|---|
MISSING_VALUES_DROPPED | Rows dropped due to missing values |
LABEL_NOT_FOUND | Label column not present in dataset |
LABEL_TYPE_INVALID | Label column has invalid type |
ZERO_ROWS | Dataset has zero rows after processing |
ZERO_FEATURES | Dataset has no feature columns |
LABEL_ONLY_DATASET | Dataset contains only the label column |
FEATURE_IMPORTANCE_UNSUPPORTED_MODEL | Model doesn’t support native feature importance |
LINEAR_COEFFICIENTS_UNSUPPORTED_MODEL | Model doesn’t support coefficient extraction |
Event Stream
Section titled “Event Stream”Phase 4 ships a structured event stream (events.schema.v1.json) that Python emits as JSONL on stderr, one event per line. The schema is frozen at v1.0.0 and covers nine event types:
| Event | When emitted |
|---|---|
run_start | Run begins |
dataset_loaded | CSV parsed and validated |
train_started | Pipeline fit begins |
train_progress | Per epoch (Phase 4 cardinality) |
train_finished | Pipeline fit complete |
metrics_computed | Validation metrics computed |
artifacts_written | All artifacts flushed to disk |
cancelling | SIGTERM received; emits per-second seconds_remaining countdown |
run_cancelled | Graceful cancel cleanup complete |
Event emission order is deterministic across re-runs of the same configuration; timestamps naturally vary. Non-JSONL stderr is treated as free-form log lines. The TS Bridge validates each event against the schema and drops malformed events without throwing.
Workspace Trust
Section titled “Workspace Trust”Python subprocess spawn (training, version check, GPU probe, dataset inspect, artifact inspect) is gated by VS Code workspace trust. RunForge runs user-controlled Python from a workspace-settable path; the trust guard prevents an untrusted workspace from inducing arbitrary Python execution. Untrusted workspaces receive a structured error pointing to the Manage Workspace Trust UI.
Security and Data Scope
Section titled “Security and Data Scope”Data touched: workspace CSV files (read-only for training), .ml/ directory (run metadata, model artifacts, metrics). Python subprocess stdout/stderr.
Data NOT touched: no files outside the open workspace, no browser data, no OS credentials.
Permissions: filesystem read/write within workspace only, Python subprocess execution (gated by workspace trust).
No network egress. All operations are local. No telemetry is collected or sent.
For the full trust model, see TRUST_MODEL.md.