Recipes

A library of short, paste-and-run snippets keyed by what you actually want to do. Each recipe assumes you have already installed Backpropagate (see Getting Started) and have a CUDA GPU available.

If you are looking for symptoms-first triage, head to troubleshooting instead.

Fine-tune a Llama 3 model on a custom JSONL dataset

Llama 3 chat models (meta-llama/Llama-3.2-3B-Instruct, meta-llama/Llama-3.2-1B-Instruct) are gated on Hugging Face — accept the license on the model page and run huggingface-cli login first.

from backpropagate import Trainer

trainer = Trainer("meta-llama/Llama-3.2-3B-Instruct")
trainer.train("my_data.jsonl", steps=200)
trainer.save("./output/llama3-finetuned")

CLI equivalent:

backprop train \
  --model meta-llama/Llama-3.2-3B-Instruct \
  --data my_data.jsonl \
  --steps 200 \
  --output ./output/llama3-finetuned

Your JSONL can be ShareGPT, Alpaca, OpenAI-chat, or ChatML — the auto-detector picks the right template (see Training → Dataset formats). v1.2.0 fixed the tokenizer-aware train_on_responses_only masker so Llama 3 chat templates mask correctly (the v1.1.x bug silently trained on user prompts as well). If you were getting bad fine-tunes on Llama 3, re-run on v1.2.0+ — see migrations → behavioural fixes.

Reasoning-trace SFT (R1 distillation)

New in v1.5 (T3.2). Distill a reasoning model the easy way: pure SFT on traces that interleave a <think>...</think> chain-of-thought with the final answer (the half of DeepSeek-R1 distillation that needs no RL). Your dataset rows carry the thinking block inside the assistant turn:

{"messages": [
  {"role": "user", "content": "What is 17 * 24?"},
  {"role": "assistant", "content": "<think>17 * 24 = 17 * 20 + 17 * 4 = 340 + 68 = 408.</think>408"}
]}

Turn on the recipe with one flag (Python):

from backpropagate import Trainer

trainer = Trainer("Qwen/Qwen2.5-7B-Instruct", reasoning_trace=True)
trainer.train("reasoning_traces.jsonl", steps=200)
trainer.save("./output/qwen-reasoner")

CLI equivalent:

backprop train \
  --model Qwen/Qwen2.5-7B-Instruct \
  --data reasoning_traces.jsonl \
  --steps 200 \
  --reasoning-trace \
  --output ./output/qwen-reasoner

What --reasoning-trace does:

Keeps <think> in the training target. The chat-template converters already preserve <think> blocks verbatim — nothing is stripped. Crucially, <think> stays plain text: Backpropagate does not add special tokens or resize the embedding matrix for it. That keeps the merge → GGUF → Ollama export path intact — a reasoning fine-tune ships to ollama run exactly like any other (see Export to Ollama).
Trace-length filtering. Rows whose summed <think> token count falls outside [8, 8192] tokens are dropped — empty / degenerate traces and runaway ones both hurt distillation. Tune the band with BACKPROPAGATE_DATA__MIN_TRACE_TOKENS / BACKPROPAGATE_DATA__MAX_TRACE_TOKENS (the tokenizer’s own encode does the counting, so the cutoffs are exact for your model). Rows with no <think> span at all are dropped too.
Raises the default max_seq_length to 8192. Reasoning traces routinely exceed the shipped 2048-token window; the bump only fires when you left max_seq_length at the default. An explicit value — kwarg max_seq_length=... or BACKPROPAGATE_MODEL__MAX_SEQ_LENGTH — always wins.

The recipe is SFT only — it is ignored under any preference method (--method orpo / simpo / kto), which logs a one-line advisory if you set both. If your model’s chat template injects its own empty <think> opener AND your data already opens with <think>, you’ll get a one-line advisory warning about the doubled tag (strip the leading <think> from your data, or use a template that doesn’t inject one).

Preference-tune on paired data with SimPO

New in v1.6. SimPO is the tightest-VRAM paired-preference method — reference-free, length-normalized reward, no second model. Your data is paired {prompt, chosen, rejected} (see Preference tuning → data shapes):

backprop train \
  --model Qwen/Qwen2.5-7B-Instruct \
  --data prefs.jsonl \
  --method simpo \
  --steps 200 \
  --output ./output/qwen-simpo

Python:

from backpropagate import Trainer

trainer = Trainer("Qwen/Qwen2.5-7B-Instruct", method="simpo")
trainer.train("prefs.jsonl", steps=200)
trainer.save("./output/qwen-simpo")

You do not need to set a learning rate — SimPO auto-anchors to 1e-6 (high LR is SimPO’s documented repetitive-output failure mode; a value ≥ 1e-5 is clamped with a warning). The defaults --simpo-beta 2.0 and --simpo-gamma 1.0 are the paper’s safe floor; keep the gamma/beta ratio ≤ 1.0 (a higher ratio warns about degeneration). SimPO is TRL’s CPOTrainer with loss_type="simpo" + cpo_alpha=0 forced (pure SimPO, never CPO-SimPO).

Preference-tune on unpaired binary feedback with KTO

New in v1.6. KTO is the unpaired / binary-feedback method — use it when you have thumbs-up/thumbs-down telemetry rather than matched pairs. Each row is {prompt, completion, label} with a boolean label (no requirement that good and bad rows share a prompt):

{"prompt": "Write a commit message for a one-line typo fix.", "completion": "fix typo in README", "label": true}
{"prompt": "Write a commit message for a one-line typo fix.", "completion": "Various changes and improvements.", "label": false}
{"prompt": "Summarize the meeting in one sentence.", "completion": "We agreed to ship Friday; Jia owns rollback.", "label": true}

backprop train \
  --model Qwen/Qwen2.5-7B-Instruct \
  --data feedback.jsonl \
  --method kto \
  --steps 200 \
  --output ./output/qwen-kto

KTO is LoRA-only in v1.6 (--mode full is rejected) — it uses the frozen LoRA base as its own reference, so no second model is loaded and the 16 GB envelope is preserved. The LR auto-anchors to 1e-6. You set --kto-desirable-weight / --kto-undesirable-weight as a starting point; the trainer auto-rebalances the effective weights from your label counts toward the [1:1, 4:3] band (logged at preflight), so a class-imbalanced dataset still trains both polarities. See Preference tuning for the full method comparison.

Score a run against a held-out set with task metrics (and gate on it)

New in v1.6. Evaluate a recorded run with deterministic, judge-free task metrics — no LLM judge. First carve a held-out reference set out of your data, then score the run against it:

# 1. Split off a reproducible 10% held-out reference set
backprop data split my_data.jsonl --heldout-ratio 0.1 --seed 0
#    -> writes my_data.train.jsonl + my_data.heldout.jsonl next to the input

# 2. Score the run on exact-match + token-F1 against the held-out references
backprop eval <run_id> \
  --references my_data.heldout.jsonl \
  --metric normalized_exact_match \
  --metric token_f1

Each held-out reference line is {"prompt": "...", "reference": "..."} (or "references": ["...", "..."] for multiple acceptable answers). Available metrics: normalized_exact_match, token_f1, contains, regex, pass_rate. --metric is repeatable; when you pass --references with no --metric, it defaults to normalized_exact_match + token_f1. (ROUGE-L / BLEU are intentionally not gateable metrics — they reward surface n-gram overlap and are easily gamed.)

To gate a continual-merge / SLAO campaign on non-regression, add --gate-against and name the metrics that must not regress with --gate-metric:

backprop eval <candidate_run_id> \
  --gate-against <baseline_run_id> \
  --references my_data.heldout.jsonl \
  --metric normalized_exact_match --metric token_f1 \
  --gate-metric normalized_exact_match \
  --max-regression 0.0

The gate is a conjunction: it accepts only if held-out loss did not regress beyond --max-regression (a non-regression floor) and every --gate-metric did not drop beyond its noise band (the metric’s bootstrap CI half-width, or a default 5-point band). A real metric regression rejects even when loss improved; a metric drop smaller than the band is treated as sampling noise. A tripped gate exits 65 (EX_DATAERR) and stamps RUNTIME_EVAL_GATE_REGRESSED in the structured log. If fewer than ~100 reference items are scored, the gate logs a loud underpowered warning (it still returns a verdict — it does not block on statistical power alone).

Quick “did my finetune work?” generation

New in v1.6. backprop generate runs ad-hoc inference against an adapter directory on disk (not a recorded run_id) — the fastest sanity check after a run, with no run history or held-out set required:

# Base model inferred from the adapter's adapter_config.json
backprop generate ./output "Explain LoRA in one sentence."

# Explicit base + 3 samples at a higher temperature
backprop generate ./output "Write a haiku about GPUs." \
  --base Qwen/Qwen2.5-7B-Instruct -n 3 --temperature 0.9

The base model is read from the adapter’s adapter_config.json (base_model_name_or_path) when present; pass --base <model> if it cannot be inferred. --temperature 0 (or any value ≤ 0) gives greedy / deterministic decoding; --max-new-tokens caps generation length (default 128); --seed fixes sampling. It reuses the eval harness’s model loader + generator, so the load/decode path matches backprop eval exactly.

Export a trained adapter to Ollama (one command)

The fastest path from a trained LoRA to ollama run:

from backpropagate import register_with_ollama

result = trainer.export("gguf", quantization="q4_k_m")
register_with_ollama(result.path, "my-finetuned-model")

Then:

ollama run my-finetuned-model

CLI equivalent in a single line:

backprop export ./output/lora --format gguf --quantization q4_k_m \
  --ollama --ollama-name my-finetuned-model

This merges the LoRA back into the base model, converts to GGUF at the chosen quantization, writes an Ollama Modelfile next to the .gguf, and registers the model with the local Ollama daemon. If Ollama is not running you’ll see DEP_OLLAMA_REGISTRATION_FAILED — start it with ollama serve and retry (see troubleshooting → Ollama not running).

Resume an interrupted multi-run

Multi-run training writes per-run state to output_dir/run_history.json. If your training crashed at run 3 of 5, resume from where it stopped:

from backpropagate import MultiRunTrainer

trainer = MultiRunTrainer("Qwen/Qwen2.5-7B-Instruct")
trainer.resume(run_id="<run_id_from_the_crashed_log_line>")

CLI equivalent:

backprop resume --run-id <run_id>

The resume path restores optimizer state, scheduler state, and step counter from the most recent checkpoint inside the run’s output_dir/checkpoint-<N>/. v1.2.0 fixed the single-run resume path that silently restarted from step 0 in v1.1.x (BACKEND-F-017 — see migrations).

Strict-miss contract (v1.3): if resume_from=<run_id> refers to a run that no longer exists on disk (history record deleted, checkpoint directory wiped), the trainer raises INPUT_RESUME_NOT_FOUND rather than silently falling back to a fresh start. To resume by run_id you need the on-disk state to still be there; if you want a fresh start, omit resume_from or pass resume_from=None.

Diff two runs with different learning rates

If you ran the same training with two different --lr values and want to compare:

backprop list-runs
backprop show-run <run_id_a>
backprop show-run <run_id_b>

For programmatic consumption (v1.2.0+):

backprop runs --limit 10        # JSON enumerator
backprop runs --status completed --limit 5

A dedicated backprop diff-runs <run_id_a> <run_id_b> subcommand that prints a side-by-side comparison of hyperparameters + final-loss + loss-curves is on the v1.3 roadmap (FRONTEND/BACKEND Wave 6b). Until it ships, the JSON output of backprop runs plus jq or a short Python script will get you the comparison you need:

backprop runs --limit 50 \
  | jq '.runs | map(select(.run_id == "<run_id_a>" or .run_id == "<run_id_b>"))'

Add a custom callback for logging

TrainingCallback exposes five hooks: on_step, on_epoch, on_save, on_complete, on_error. v1.2.0 fixed the bug that left on_step / on_epoch / on_save as silent no-ops in v1.1.x — if you wrote a callback against v1.1.x and never saw the hooks fire, expect to see them now (see migrations).

from backpropagate import Trainer, TrainingCallback

def log_step(step: int, loss: float) -> None:
    if step % 10 == 0:
        print(f"step={step:5d} loss={loss:.4f}")

callback = TrainingCallback(
    on_step=log_step,
    on_complete=lambda run: print(f"done — final loss {run.final_loss:.4f}, run_id={run.run_id}"),
    on_error=lambda err: print(f"failed: {err}"),
)

trainer = Trainer("Qwen/Qwen2.5-7B-Instruct")
trainer.train("my_data.jsonl", steps=100, callback=callback)

Each hook is isolated — an exception in your on_step does not kill the training loop. The exception is caught, logged with the run_id, and training continues.

Push to a private Hugging Face Hub repo

backprop push ./output/lora --repo your-org/qwen-finetune --private

The --private flag makes the repo private at creation time. The token resolution order is --token flag → HF_TOKEN env var → HUGGING_FACE_HUB_TOKEN env var → ~/.cache/huggingface/token (from huggingface-cli login). Use huggingface-cli login to cache a token from https://huggingface.co/settings/tokens — make sure the token has write scope.

One-shot export + push:

backprop export ./output/lora --format lora --push-to-hub your-org/qwen-finetune

The model_card.md written next to the local export is mirrored as README.md inside the upload, so the HF UI renders it as the repo’s model card. See export → Hub push for the full Hub-push surface.

Run the Reflex UI with `--share` over a real cloudflared tunnel

Prerequisite: install cloudflared (Cloudflare’s tunnel client) — see https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/downloads/. The v1.3 --share implementation shells out to cloudflared tunnel --url http://127.0.0.1:<port> and parses the announced https://*.trycloudflare.com URL out of the daemon’s stderr. No account, no zone, no DNS setup required — cloudflared provisions an ephemeral quick-tunnel that lives for the duration of the backprop ui process.

backprop ui --share --auth alice:super-secret-password

You’ll see the announced URL in the startup banner (and the same URL is added to the Host / Origin allowlist). The v1.2.0 FastAPI middleware enforces HTTP Basic auth on every request and the /_event WebSocket upgrade, so anyone who hits the URL is challenged for the credentials.

Required: --share without --auth exits 1 with [RUNTIME_UI_AUTH_NOT_ENFORCED] (closes the v1.1.x foot-gun published as GHSA-f65r-h4g3-3h9h). For the full contract see security → four-layer defense in depth.

If you don’t want a public URL: SSH port-forwarding stays the lower-friction option for “I just want to reach my remote training box from my laptop” — see security → SSH port-forwarding recipe.

Fine-tune on multi-GPU

Multi-GPU training is not officially supported in v1.3 — the library targets the single-GPU operator (16 GB VRAM workstation as the canonical target). If you want to try it anyway, the recommended setup is HuggingFace’s accelerate library:

pip install accelerate
accelerate config       # answer the prompts; pick "multi-GPU"
accelerate launch -m backpropagate.cli train \
  --model Qwen/Qwen2.5-7B-Instruct \
  --data my_data.jsonl \
  --steps 200

accelerate launch wraps the training entry point with multi-process initialisation (NCCL, distributed sampler, gradient sync). The Unsloth backend may not work cleanly under accelerate — start with --no-unsloth if you hit unsloth import errors. Expect rough edges; the GPU-monitoring (gpu_safety.py) is per-process and may report misleading temperatures across multiple GPUs. Multi-GPU NCCL failures emit RUNTIME_* errors and are not in the v1.3 retryable-error matrix.

For deeper multi-GPU support (FSDP, deepspeed) consider running training under transformers.Trainer directly and re-using only backpropagate.export for the GGUF + Ollama step.

Custom dataset format / data collator

If your dataset doesn’t fit ShareGPT / Alpaca / OpenAI-chat / ChatML / raw-text, the cleanest path is to pre-process to one of those formats with a 10-line script. Example: convert a CSV of (prompt, completion) pairs to OpenAI-chat JSONL:

import csv, json

with open("pairs.csv") as fp_in, open("converted.jsonl", "w") as fp_out:
    for row in csv.DictReader(fp_in):
        record = {"messages": [
            {"role": "user", "content": row["prompt"]},
            {"role": "assistant", "content": row["completion"]},
        ]}
        fp_out.write(json.dumps(record) + "\n")

Then point Backpropagate at converted.jsonl — it auto-detects the OpenAI-chat shape.

For a truly custom collator (e.g. structured multi-turn with extra fields), load the dataset yourself as a HuggingFace Dataset and pass it directly:

from datasets import load_dataset
from backpropagate import Trainer

ds = load_dataset("json", data_files="my_weird_format.jsonl", split="train")
ds = ds.map(my_custom_preprocessing, batched=True)

trainer = Trainer("Qwen/Qwen2.5-7B-Instruct")
trainer.train(ds, steps=100)

The auto-format detector is skipped when you pass a pre-built Dataset; you’re responsible for shaping it into the columns Backpropagate’s collator expects (the default expects either a text column with full ChatML or a messages column in OpenAI-chat shape). See training → dataset formats.

Use `--auth-file` for shell-history-safe auth

Passing --auth user:pass on the command line works, but the credential lands in your shell history file and is briefly visible in ps aux. The v1.3 --auth-file <path> flag reads the credential from a file instead — same user:pass shape, one line, no trailing newline noise:

echo -n "alice:super-secret-password" > ~/.config/backpropagate/auth
chmod 600 ~/.config/backpropagate/auth
backprop ui --share --auth-file ~/.config/backpropagate/auth

The CLI reads the file, validates the shape with the same validate_auth_shape used for --auth, and threads the credential into the Reflex subprocess via BACKPROPAGATE_UI_AUTH. The file is never logged; the credential is redacted from any error output. --auth and --auth-file are mutually exclusive — passing both exits 1 with INPUT_AUTH_INVALID_SHAPE.

--auth-file satisfies the same --share / --host <non-loopback> requirement that --auth does — passing it means the four-layer defense is satisfied. See security → auth middleware for the full mode matrix.

Recipes

Fine-tune a Llama 3 model on a custom JSONL dataset

Reasoning-trace SFT (R1 distillation)

Preference-tune on paired data with SimPO

Preference-tune on unpaired binary feedback with KTO

Score a run against a held-out set with task metrics (and gate on it)

Quick “did my finetune work?” generation

Export a trained adapter to Ollama (one command)

Resume an interrupted multi-run

Diff two runs with different learning rates

Add a custom callback for logging

Push to a private Hugging Face Hub repo

Run the Reflex UI with --share over a real cloudflared tunnel

Fine-tune on multi-GPU

Custom dataset format / data collator

Use --auth-file for shell-history-safe auth

See also

Run the Reflex UI with `--share` over a real cloudflared tunnel

Use `--auth-file` for shell-history-safe auth