Python · PyPI

Fine-tune LLMs in 3 lines.

Headless LLM fine-tuning with smart defaults. Automatic hyperparameter tuning, VRAM-aware batch sizing, multi-run SLAO training to prevent catastrophic forgetting, and one-click GGUF export for Ollama. First-class Windows and CUDA support.

Get started Read the Handbook

Quickstart

pip install backpropagate[standard] # Train in 3 lines from backpropagate import Trainer trainer = Trainer("unsloth/Qwen2.5-7B-Instruct-bnb-4bit") trainer.train("my_data.jsonl", steps=100) trainer.export("gguf", quantization="q4_k_m") # Ready for Ollama

Multi-run SLAO

from backpropagate.multi_run import MultiRunTrainer runner = MultiRunTrainer( model="unsloth/Llama-3.2-3B-Instruct-bnb-4bit", num_runs=5, steps_per_run=100, merge_mode="slao", ) result = runner.run("my_data.jsonl")

Export to Ollama

from backpropagate.export import export_gguf, register_with_ollama result = export_gguf(model, tokenizer, "./output", quantization="q4_k_m") register_with_ollama(result.path, model_name="my-model")

Fine-tuning without the friction

Built for developers who want results, not configuration.

Smart defaults

Automatically configures learning rate, batch size, gradient accumulation, and LoRA rank based on your hardware and dataset size. No hyperparameter guesswork.

VRAM-aware training

Auto batch sizing and gradient checkpointing keep training stable on any GPU. Built-in VRAM monitoring with warnings before OOM. Works from 8GB up to multi-GPU setups.

First-class Windows

Tested and optimized for Windows + CUDA. Avoids the common PyTorch/Unsloth pitfalls on Windows. If it runs on Linux, it runs on Windows too.

Modular installation

Install only the dependencies you need.

Extra

What you get

Key dependencies

backpropagate

Core API only — minimal footprint

—

[unsloth]

2× faster training, 50% less VRAM

unsloth

[ui]

Reflex (Radix UI) web interface

reflex

[validation]

Pydantic config validation

pydantic, pydantic-settings

[export]

GGUF export for Ollama

llama-cpp-python

[monitoring]

WandB + system monitoring

wandb, psutil

[logging]

Structured logging (2026 best practices)

structlog

[security]

JWT auth + secure token generation

PyJWT, cryptography

[standard]

unsloth + ui (recommended)

all of the above

[production]

unsloth + ui + validation + logging + security

production deployment

[full]

Everything

all extras

Get started

Install

# Recommended
pip install backpropagate[standard]

# Minimal core only
pip install backpropagate

# All extras
pip install backpropagate[full]

# Requires: Python 3.10+ · CUDA GPU (8GB+ VRAM)

Basic training

from backpropagate import Trainer

# Smart defaults — no config needed
trainer = Trainer("unsloth/Qwen2.5-7B-Instruct-bnb-4bit")
trainer.train("my_data.jsonl", steps=100)
trainer.save("./my-model")

Multi-run SLAO

from backpropagate.multi_run import MultiRunTrainer

runner = MultiRunTrainer(
    model="unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
    num_runs=5, steps_per_run=100,
    merge_mode="slao",
)
result = runner.run("my_data.jsonl")

Export to Ollama

from backpropagate.export import export_gguf, register_with_ollama

result = export_gguf(model, tokenizer, "./output", quantization="q4_k_m")
register_with_ollama(result.path, model_name="my-model")
# ollama run my-model

Production-ready by design

Built for CI/CD pipelines, automated workflows, and long training runs.

Headless by design

No UI required. Runs in CI/CD pipelines, SSH sessions, and automated workflows. Full Python API with structured logging. Callbacks for progress tracking and early stopping.

Multi-run SLAO

Single LoRA Continual Learning via Asymmetric Merging (arXiv:2512.23017) prevents catastrophic forgetting during extended fine-tuning campaigns via orthogonal init, asymmetric A/B handling, and time-aware scaling. Checkpoint-and-resume keeps long runs recoverable after crashes.

LoRA + QLoRA + full FT + Unsloth

Supports LoRA, QLoRA (4-bit), and (v1.4) full fine-tuning for ≤3B models on consumer 16GB GPUs. Unsloth-accelerated training. Mix quantization levels per layer. Export to GGUF at any quantization: q2_k, q4_k_m, q8_0, or f16.

Quality scorecard

Ship Gate audit — 24/37 checked, 13 skipped (each with justification), 100% pass on every applicable item.