BP Backpropagate
Python · PyPI

Fine-tune LLMs in 3 lines.

Headless LLM fine-tuning with smart defaults. Automatic hyperparameter tuning, VRAM-aware batch sizing, multi-run SLAO training to prevent catastrophic forgetting, and one-click GGUF export for Ollama. First-class Windows and CUDA support.

Quickstart

pip install backpropagate[standard] # Train in 3 lines from backpropagate import Trainer trainer = Trainer("unsloth/Qwen2.5-7B-Instruct-bnb-4bit") trainer.train("my_data.jsonl", steps=100) trainer.export("gguf", quantization="q4_k_m") # Ready for Ollama

Multi-run SLAO

from backpropagate.multi_run import MultiRunTrainer runner = MultiRunTrainer( model="unsloth/Llama-3.2-3B-Instruct-bnb-4bit", num_runs=5, steps_per_run=100, merge_mode="slao", ) result = runner.run("my_data.jsonl")

Export to Ollama

from backpropagate.export import export_gguf, register_with_ollama result = export_gguf(model, tokenizer, "./output", quantization="q4_k_m") register_with_ollama(result.path, model_name="my-model")

Fine-tuning without the friction

Built for developers who want results, not configuration.

Smart defaults

Automatically configures learning rate, batch size, gradient accumulation, and LoRA rank based on your hardware and dataset size. No hyperparameter guesswork.

VRAM-aware training

Auto batch sizing and gradient checkpointing keep training stable on any GPU. Built-in VRAM monitoring with warnings before OOM. Works from 8GB up to multi-GPU setups.

First-class Windows

Tested and optimized for Windows + CUDA. Avoids the common PyTorch/Unsloth pitfalls on Windows. If it runs on Linux, it runs on Windows too.

Modular installation

Install only the dependencies you need.

Extra
What you get
Key dependencies
backpropagate
Core API only — minimal footprint
[unsloth]
2× faster training, 50% less VRAM
unsloth
[ui]
Reflex (Radix UI) web interface
reflex
[validation]
Pydantic config validation
pydantic, pydantic-settings
[export]
GGUF export for Ollama
llama-cpp-python
[monitoring]
WandB + system monitoring
wandb, psutil
[logging]
Structured logging (2026 best practices)
structlog
[security]
JWT auth + secure token generation
PyJWT, cryptography
[standard]
unsloth + ui (recommended)
all of the above
[production]
unsloth + ui + validation + logging + security
production deployment
[full]
Everything
all extras

Get started

Install

# Recommended
pip install backpropagate[standard]

# Minimal core only
pip install backpropagate

# All extras
pip install backpropagate[full]

# Requires: Python 3.10+ · CUDA GPU (8GB+ VRAM)

Basic training

from backpropagate import Trainer

# Smart defaults — no config needed
trainer = Trainer("unsloth/Qwen2.5-7B-Instruct-bnb-4bit")
trainer.train("my_data.jsonl", steps=100)
trainer.save("./my-model")

Multi-run SLAO

from backpropagate.multi_run import MultiRunTrainer

runner = MultiRunTrainer(
    model="unsloth/Llama-3.2-3B-Instruct-bnb-4bit",
    num_runs=5, steps_per_run=100,
    merge_mode="slao",
)
result = runner.run("my_data.jsonl")

Export to Ollama

from backpropagate.export import export_gguf, register_with_ollama

result = export_gguf(model, tokenizer, "./output", quantization="q4_k_m")
register_with_ollama(result.path, model_name="my-model")
# ollama run my-model

Production-ready by design

Built for CI/CD pipelines, automated workflows, and long training runs.

Headless by design

No UI required. Runs in CI/CD pipelines, SSH sessions, and automated workflows. Full Python API with structured logging. Callbacks for progress tracking and early stopping.

Multi-run SLAO

Single LoRA Continual Learning via Asymmetric Merging (arXiv:2512.23017) prevents catastrophic forgetting during extended fine-tuning campaigns via orthogonal init, asymmetric A/B handling, and time-aware scaling. Checkpoint-and-resume keeps long runs recoverable after crashes.

LoRA + QLoRA + full FT + Unsloth

Supports LoRA, QLoRA (4-bit), and (v1.4) full fine-tuning for ≤3B models on consumer 16GB GPUs. Unsloth-accelerated training. Mix quantization levels per layer. Export to GGUF at any quantization: q2_k, q4_k_m, q8_0, or f16.

Quality scorecard

Ship Gate audit — 24/37 checked, 13 skipped (each with justification), 100% pass on every applicable item.

Category
Score
Notes
A. Security
5/8
SECURITY.md, trust model, no secrets/telemetry, safe_path(), output-directory denylist; the 3 SKIPs cover destructive-action / MCP rows that do not apply.
B. Error Handling
3/7
Structured exception shape (code/message/hint/cause/retryable) via ERROR_CODES registry; CLI exit codes 0/1/2/3; no raw stack traces without --verbose; run_id correlation; redacted stderr; --share+--auth gating; the 4 SKIPs cover MCP / desktop / VS Code rows that do not apply.
C. Operator Docs
4/7
README, CHANGELOG, LICENSE, --help; the 3 SKIPs cover formal log-tier / MCP / operational-complexity rows.
D. Shipping Hygiene
7/9
verify.sh, version=tag, 5 scanners in CI, dependabot, npm publish with Sigstore provenance; the 2 SKIPs cover VS Code extension / desktop app rows.
E. Identity
5/6
Logo, translations, landing page, metadata; soft gate, does not block ship.