Skip to content

Beginners Guide

This guide walks you through your first fine-tuning run with Backpropagate, from zero to a working Ollama model. No prior experience with LoRA, GGUF, or training pipelines is required.

Large language models (LLMs) like Qwen and Llama are trained on broad internet text. Fine-tuning teaches an existing model new behavior using your own data — customer support logs, code examples, domain-specific Q&A, or any conversational dataset. Instead of retraining billions of parameters from scratch, Backpropagate uses LoRA (Low-Rank Adaptation) to train a small set of adapter weights that modify the base model’s behavior. This is fast, uses far less GPU memory, and produces results you can export and run locally with Ollama.

Before you start, make sure you have:

  • Python 3.10 or newer — check with python --version
  • A CUDA GPU with 8GB+ VRAM — NVIDIA RTX 3060 or better. Check with nvidia-smi
  • PyTorch 2.0+ with CUDA support — install from pytorch.org
  • Ollama (optional) — for running your exported model locally. Install from ollama.com

If you are on Windows, Backpropagate handles the common PyTorch/CUDA pitfalls automatically (multiprocessing crashes, xformers incompatibilities, dataloader issues).

Install Backpropagate with the recommended extras. pipx is the recommended install path — it puts Backpropagate in its own isolated environment with PATH integration, so you don’t have to manage a virtualenv:

Terminal window
pipx install "backpropagate[standard]"

This gives you the core library plus Unsloth (2× faster training) and the Reflex web interface. Other isolated install paths:

Terminal window
uv tool install "backpropagate[standard]" # uv's equivalent, faster install
pip install "backpropagate[standard]" # if you already manage a venv

If you only want the Python API with no extras:

Terminal window
pipx install backpropagate

Verify the install:

Terminal window
backprop info

This prints your Python version, GPU details, VRAM, and which optional features are available.

Backpropagate accepts JSONL files with conversation data. The simplest format is OpenAI-style messages:

{"messages": [{"role": "user", "content": "What is LoRA?"}, {"role": "assistant", "content": "LoRA stands for Low-Rank Adaptation..."}]}
{"messages": [{"role": "user", "content": "How do I export to GGUF?"}, {"role": "assistant", "content": "Use trainer.export('gguf')..."}]}

Save this as my_data.jsonl. Each line is one conversation. Aim for at least 100 examples for a meaningful fine-tune, though 500+ is better.

Backpropagate also auto-detects ShareGPT, Alpaca, and ChatML formats, so use whatever you have. The repo ships an examples/quickstart.jsonl (5 ShareGPT examples) you can use to verify your install before bringing your own data.

Three lines of Python:

from backpropagate import Trainer
trainer = Trainer("Qwen/Qwen2.5-7B-Instruct")
trainer.train("my_data.jsonl", steps=100)
trainer.save("./my-model")

What happens behind the scenes:

  1. The model downloads from HuggingFace (first run only, cached afterward)
  2. Backpropagate detects your GPU VRAM and picks a safe batch size
  3. LoRA adapters are applied to the model’s attention layers
  4. Training runs for 100 steps with cosine learning rate scheduling
  5. The trained adapter is saved to ./my-model

You can also train from the command line:

Terminal window
backprop train --data my_data.jsonl --steps 100

Or use the web UI:

Terminal window
backprop ui

If you plan to share the UI on a public URL (backprop ui --share), you also need --auth user:password — see the troubleshooting page for the reasoning. Local-only backprop ui (no --share) needs no auth.

A successful first run prints something like:

run_started run_id=8f3a2c1d-9e4b-4c5a-...
Trainer initialized: Qwen/Qwen2.5-7B-Instruct
LoRA: r=256, alpha=512
Batch: 2, LR: 0.0002
Degradation knobs: oom_recovery=True, unsloth_fallback=True
Training: [####################] 100% loss=0.42 steps=100
Saved to ./output/lora
run_ended run_id=8f3a2c1d-... duration_seconds=412.3

After the run, your output directory has:

my-model/
├── adapter_config.json <- adapter metadata
├── adapter_model.safetensors <- the trained LoRA weights
└── tokenizer.json <- copied from the base model

To know it worked: adapter_model.safetensors should be a few hundred MB to ~1.5 GB on a 7B base (v1.3 default is rank 256 + all-linear; pass --lora-preset=fast for the v1.2.x rank-16 ~50–200 MB footprint), and backprop info should show no errors. If the loss decreased over the run (you’ll see logging lines every 10 steps), the model learned something.

If something went wrong, see the troubleshooting page — it’s keyed by what you actually saw in stderr.

Once training is done, export to GGUF and register with Ollama:

# Export to GGUF (quantized for fast local inference)
result = trainer.export("gguf", quantization="q4_k_m")
# Register with Ollama
from backpropagate import register_with_ollama
register_with_ollama(result.path, "my-finetuned-model")

Now run it:

Terminal window
ollama run my-finetuned-model

The q4_k_m quantization gives a good balance between file size and quality. For higher quality at larger file size, use q8_0. For the smallest file, use q2_k.

CLI equivalent for export:

Terminal window
backprop export ./my-model --format gguf --quantization q4_k_m --ollama --ollama-name my-finetuned-model

Once you have a working fine-tune, here are ways to improve:

  • More data — Fine-tuning quality scales with dataset size and diversity. 1,000+ high-quality examples produce noticeably better results than 100.
  • Multi-run SLAO training — Prevents catastrophic forgetting during longer training by merging LoRA adapters between runs. Use trainer.multi_run() instead of trainer.train() for extended fine-tuning.
  • Training presets — Use get_preset("balanced") or get_preset("quality") from backpropagate.config for research-backed hyperparameter combinations.
  • Dataset quality tools — The backpropagate.datasets module offers deduplication, perplexity filtering, and curriculum learning to improve your training data before training.
  • GPU monitoring — For long training runs, GPUMonitor watches temperature and VRAM, pausing training before your hardware hits dangerous levels.
  • Experiment tracking — Install the [monitoring] extra to log training runs to Weights & Biases.

For detailed coverage of each topic, see the Training, Export, and Reference pages.