Beginners
This page is for anyone new to ML experiment tracking or RunForge Desktop. It walks through the key concepts, common workflows, and answers the questions most people have when they open the app for the first time.
What is RunForge Desktop?
Section titled “What is RunForge Desktop?”RunForge Desktop is a Windows application that helps you run, monitor, and inspect machine learning training experiments on your own computer. It replaces the cycle of running Python scripts in a terminal, manually checking log files, and losing track of which settings you used last time.
The app does three things:
- Launches training runs with the settings you choose (epochs, batch size, learning rate, device)
- Monitors them in real time with live loss charts and streaming logs
- Stores every run’s configuration, metrics, and output so you can inspect and compare them later
Everything stays on your machine. There are no accounts, no cloud services, and no telemetry.
Key concepts
Section titled “Key concepts”Workspace — a folder on your computer where RunForge stores all experiment data. When you select a workspace, the app creates a .ml/runs/ directory inside it. Each training run gets its own timestamped subfolder.
Run — a single training execution. A run has a configuration (epoch count, batch size, learning rate, optimizer, scheduler, device) and produces artifacts: a manifest (run.json), a metrics stream (metrics.jsonl), and log files (stdout.log, stderr.log).
Preset — a shortcut for common epoch counts. “Quick” runs a few epochs to verify your setup; “Standard” runs a normal session; “Extended” runs longer for thorough training; “Custom” lets you set the exact count.
Sweep — running multiple experiments with different hyperparameter combinations automatically. You provide lists of values for learning rate, batch size, or optimizer, and RunForge generates every combination as a grid search.
Manifest — the run.json file inside each run folder. It records exactly what settings were used, when the run started and finished, and what the outcome was. This is how RunForge makes experiments reproducible.
Metrics stream — the metrics.jsonl file. Each line is a JSON object recording the step number, epoch, loss value, learning rate, and timestamp. Because it uses append-only JSONL format, RunForge can read it while training is still running.
Installation walkthrough
Section titled “Installation walkthrough”Option A: Install the MSIX package
Section titled “Option A: Install the MSIX package”This is the recommended approach. The MSIX package bundles the .NET runtime so you do not need to install it separately.
- Go to the Releases page
- Download the
.msixfile from the latest release - Double-click the downloaded file — Windows will handle the installation
- Find “RunForge Desktop” in your Start Menu and launch it
To uninstall, use “Add or remove programs” in Windows Settings. The uninstall is clean and removes all app files.
Option B: Build from source
Section titled “Option B: Build from source”If you want to modify the app or just prefer building yourself:
- Install the .NET 10 SDK
- Install Visual Studio 2022 (17.12+) with the .NET MAUI workload
- Clone the repository and run:
git clone https://github.com/mcp-tool-shop-org/runforge-desktopcd runforge-desktopdotnet run --project src/RunForgeDesktop/RunForgeDesktop.csprojPython setup
Section titled “Python setup”RunForge spawns Python processes to execute training. You need:
- Python 3.10+ — install from python.org or the Microsoft Store
- PyTorch — install with
pip install torch
RunForge auto-discovers Python on your system. If auto-discovery picks the wrong version (for example, if you have multiple Python installations), go to Settings and set a manual path override.
Your first training run
Section titled “Your first training run”Follow these steps to launch your first experiment:
- Launch RunForge Desktop from the Start Menu
- Select a workspace — click “Select Workspace” on the dashboard and choose an empty folder. This is where all your experiment data will live.
- Click “Train” to open the training configuration dialog
- Choose a preset — start with “Quick” for a fast sanity check
- Select a device — choose GPU if you have an NVIDIA GPU with CUDA, otherwise use CPU
- Click Start — RunForge creates the run folder, writes the manifest, and launches the Python training process
Once training starts, you are taken to the live monitoring view where you can watch the loss chart update and see log output streaming in real time.
When the run finishes, go to the runs list to see your completed experiment. Click on it to inspect its metrics, logs, and artifacts.
Common workflows
Section titled “Common workflows”Comparing hyperparameter settings
Section titled “Comparing hyperparameter settings”Use hyperparameter sweeps to find the best configuration:
- Go to the MultiRun page
- Enter comma-separated values for the parameters you want to explore (for example, learning rates
0.001, 0.01and batch sizes16, 32, 64) - RunForge generates every combination (in this case, 6 runs) and executes them
- When the sweep completes, the best-performing configuration is highlighted by final loss
- Use run comparison to see exactly what changed between any two runs
Recovering from a crash
Section titled “Recovering from a crash”If RunForge Desktop closes unexpectedly (power outage, system crash, forced termination):
- Reopen the app — it will detect the unclean shutdown
- A dialog asks whether to restore your previous session
- Click “Restore” to return to your previous workspace and page
- Any training runs that were in progress when the crash happened are automatically marked as Failed so you know exactly what to re-run
Exporting data for external analysis
Section titled “Exporting data for external analysis”If you want to analyze your results in a notebook, spreadsheet, or reporting tool:
- Open the run you want to export
- Use the export options to save metrics, feature importance, or the full run summary
- CSV exports work directly in Excel, Google Sheets, or pandas; JSON exports work in any programming language
Troubleshooting
Section titled “Troubleshooting”“Python not found” error — RunForge requires Python 3.10 or later. Install it from python.org or the Microsoft Store. If Python is installed but not detected, go to Settings and set the path manually. Make sure the Python executable is the one with PyTorch installed (check with python -c "import torch; print(torch.__version__)" in a terminal).
Training fails immediately — Check the run’s stderr log for details. Common causes: PyTorch not installed (pip install torch), insufficient disk space, or a dataset path that does not exist.
GPU not detected — RunForge checks for CUDA availability through PyTorch. If you have an NVIDIA GPU but it shows as unavailable: verify your GPU drivers are up to date, verify CUDA is installed, and verify PyTorch was installed with CUDA support (pip install torch --index-url https://download.pytorch.org/whl/cu124). If GPU is requested but not available, RunForge falls back to CPU and records the reason in the run manifest.
App crashes on startup — Check %LOCALAPPDATA%\RunForge\CrashLogs\ for crash log files. These contain the exception details and stack trace. If the crash is caused by a corrupted settings file, delete %LOCALAPPDATA%\RunForge\settings.json to reset to defaults.
Runs stuck in “Running” status — This happens when the app or training process terminated without updating the manifest. Restart the app — it automatically detects orphaned runs and marks them as Failed.
Sweep runs are slow — The execution queue respects parallelism limits. By default, at most 2 runs execute concurrently. If your machine can handle more (especially on CPU), increase the max_parallel setting when starting the daemon.
Glossary
Section titled “Glossary”| Term | Definition |
|---|---|
| Artifact | Any file produced by a training run: manifests, metrics, logs, model weights |
| Batch size | Number of training samples processed in one optimization step |
| CUDA | NVIDIA’s GPU computing platform, required for GPU-accelerated training |
| Daemon | A background process (runforge_cli) that manages the execution queue and job scheduling |
| Epoch | One complete pass through the training dataset |
| Grid search | Trying every combination of hyperparameter values in a sweep |
| Hyperparameter | A training setting you choose before training starts (learning rate, batch size, optimizer, etc.) |
| JSONL | JSON Lines format — one JSON object per line, used for the metrics stream |
| Learning rate | Controls how much model weights change per optimization step |
| Loss | A number measuring how wrong the model’s predictions are — lower is better |
| Manifest | The run.json file recording a run’s configuration, status, and timing |
| MSIX | A Windows packaging format that handles install, update, and clean uninstall |
| Optimizer | The algorithm that updates model weights (Adam, AdamW, SGD, RMSprop) |
| Preset | A shortcut for epoch count: Quick, Standard, Extended, or Custom |
| Run | A single training execution with a specific configuration |
| Scheduler | Controls how learning rate changes during training (StepLR, CosineAnnealing, OneCycleLR) |
| Sweep | Multiple training runs exploring different hyperparameter combinations |
| Workspace | A folder where RunForge stores all experiment data |