Beginners Guide
This guide walks you through your first fine-tuning run with Backpropagate, from zero to a working Ollama model. No prior experience with LoRA, GGUF, or training pipelines is required.
1. What is fine-tuning?
Section titled “1. What is fine-tuning?”Large language models (LLMs) like Qwen and Llama are trained on broad internet text. Fine-tuning teaches an existing model new behavior using your own data — customer support logs, code examples, domain-specific Q&A, or any conversational dataset. Instead of retraining billions of parameters from scratch, Backpropagate uses LoRA (Low-Rank Adaptation) to train a small set of adapter weights that modify the base model’s behavior. This is fast, uses far less GPU memory, and produces results you can export and run locally with Ollama.
2. Prerequisites
Section titled “2. Prerequisites”Before you start, make sure you have:
- Python 3.10 or newer — check with
python --version - A CUDA GPU with 8GB+ VRAM — NVIDIA RTX 3060 or better. Check with
nvidia-smi - PyTorch 2.0+ with CUDA support — install from pytorch.org
- Ollama (optional) — for running your exported model locally. Install from ollama.com
If you are on Windows, Backpropagate handles the common PyTorch/CUDA pitfalls automatically (multiprocessing crashes, xformers incompatibilities, dataloader issues).
3. Installation
Section titled “3. Installation”Install Backpropagate with the recommended extras:
pip install backpropagate[standard]This gives you the core library plus Unsloth (2x faster training) and the Gradio web UI. If you only want the Python API with no extras:
pip install backpropagateVerify the install:
backprop infoThis prints your Python version, GPU details, VRAM, and which optional features are available.
4. Prepare your dataset
Section titled “4. Prepare your dataset”Backpropagate accepts JSONL files with conversation data. The simplest format is OpenAI-style messages:
{"messages": [{"role": "user", "content": "What is LoRA?"}, {"role": "assistant", "content": "LoRA stands for Low-Rank Adaptation..."}]}{"messages": [{"role": "user", "content": "How do I export to GGUF?"}, {"role": "assistant", "content": "Use trainer.export('gguf')..."}]}Save this as my_data.jsonl. Each line is one conversation. Aim for at least 100 examples for a meaningful fine-tune, though 500+ is better.
Backpropagate also auto-detects ShareGPT, Alpaca, and ChatML formats, so use whatever you have.
5. Train your first model
Section titled “5. Train your first model”Three lines of Python:
from backpropagate import Trainer
trainer = Trainer("unsloth/Qwen2.5-7B-Instruct-bnb-4bit")trainer.train("my_data.jsonl", steps=100)trainer.save("./my-model")What happens behind the scenes:
- The model downloads from HuggingFace (first run only, cached afterward)
- Backpropagate detects your GPU VRAM and picks a safe batch size
- LoRA adapters are applied to the model’s attention layers
- Training runs for 100 steps with cosine learning rate scheduling
- The trained adapter is saved to
./my-model
You can also train from the command line:
backprop train --data my_data.jsonl --steps 100Or use the web UI:
backprop ui6. Export and run with Ollama
Section titled “6. Export and run with Ollama”Once training is done, export to GGUF and register with Ollama:
# Export to GGUF (quantized for fast local inference)result = trainer.export("gguf", quantization="q4_k_m")
# Register with Ollamafrom backpropagate import register_with_ollamaregister_with_ollama(result.path, "my-finetuned-model")Now run it:
ollama run my-finetuned-modelThe q4_k_m quantization gives a good balance between file size and quality. For higher quality at larger file size, use q8_0. For the smallest file, use q2_k.
CLI equivalent for export:
backprop export ./my-model --format gguf --quantization q4_k_m --ollama --ollama-name my-finetuned-model7. Next steps
Section titled “7. Next steps”Once you have a working fine-tune, here are ways to improve:
- More data — Fine-tuning quality scales with dataset size and diversity. 1,000+ high-quality examples produce noticeably better results than 100.
- Multi-run SLAO training — Prevents catastrophic forgetting during longer training by merging LoRA adapters between runs. Use
trainer.multi_run()instead oftrainer.train()for extended fine-tuning. - Training presets — Use
get_preset("balanced")orget_preset("quality")frombackpropagate.configfor research-backed hyperparameter combinations. - Dataset quality tools — The
backpropagate.datasetsmodule offers deduplication, perplexity filtering, and curriculum learning to improve your training data before training. - GPU monitoring — For long training runs,
GPUMonitorwatches temperature and VRAM, pausing training before your hardware hits dangerous levels. - Experiment tracking — Install the
[monitoring]extra to log training runs to Weights & Biases.
For detailed coverage of each topic, see the Training, Export, and Reference pages.