Skip to content

Export

Once your training run finishes, the trained adapter sits on disk as a LoRA — small, fast to load, but useless without a runtime that knows how to apply it on top of the base model. This page covers the three things you can do next: keep the LoRA as a LoRA (smallest output, requires the base at inference), merge the LoRA back into the base model (standalone, larger), or convert to GGUF for Ollama / llama.cpp / LM Studio. The recommended path is GGUF + Ollama — one CLI invocation goes from “training done” to “I can chat with my finetune.”

result = trainer.export("gguf", quantization="q4_k_m")

To register the exported model with Ollama:

from backpropagate import register_with_ollama
register_with_ollama(result.path, "my-finetuned-model")

Then use it locally:

Terminal window
ollama run my-finetuned-model
Terminal window
backprop export ./output/lora --format gguf --quantization q4_k_m --ollama --ollama-name my-model
QuantizationSizeQualityUse case
q2_kSmallestLowerEmbedded, constrained environments
q4_0SmallFairFast inference, lower quality
q4_k_mSmallGoodGeneral use (recommended)
q5_k_mMediumBetterBalance of size and quality
q8_0LargeHighWhen quality matters more than size
f16LargestHighestMaximum quality, no compression

Backpropagate supports three export formats via trainer.export(format=...):

FormatDescriptionUse case
loraLoRA adapter only (default)Smallest output, requires base model at inference
mergedBase model + adapter mergedStandalone model, larger but self-contained
ggufQuantized GGUF fileFor Ollama, llama.cpp, and LM Studio
  1. Merges LoRA weights back into the base model
  2. Converts to GGUF format (via Unsloth if available, otherwise llama.cpp)
  3. Applies the chosen quantization level
  4. Optionally creates an Ollama Modelfile and registers the model

Use create_modelfile() to build an Ollama Modelfile with a custom system prompt, temperature, or context length before registering:

from backpropagate import create_modelfile, register_with_ollama
modelfile_path = create_modelfile(
"output/gguf/model-q4_k_m.gguf",
system_prompt="You are a helpful coding assistant.",
temperature=0.5,
context_length=8192,
)

If you only need the default Modelfile, register_with_ollama() creates one automatically.

After registering one or more fine-tuned models, list them from Python:

from backpropagate import list_ollama_models
for model in list_ollama_models():
print(model)

This calls ollama list under the hood and returns the model names.

Every export now writes a model_card.md alongside the artifact. The card follows the Hugging Face model card schema, so when you push to the Hub it’s picked up as the repo’s landing page automatically.

The card includes:

  • Frontmatter (base_model, library_name: backpropagate, tags)
  • Property table (run_id, dataset, sha256, steps, final loss, LoRA rank/alpha, seed, duration, GPU, library version)
  • Loss curve (unicode sparkline)
  • Trust signals (Stage B/C/D + Ship Gate)
  • Reproduce-this-run command

Disable card emission with backprop export ... --no-model-card.

Backpropagate ships first-class Hugging Face Hub push from the CLI:

Terminal window
# adapter-only push (default — smaller, faster, more useful for LoRA finetunes)
backprop push ./output/lora --repo alice/qwen-finetune
# private repo
backprop push ./output/lora --repo alice/qwen-finetune --private
# include the base model
backprop push ./output/merged --repo alice/qwen-finetune --include-base
# one-shot export + push
backprop export ./output/lora --format lora --push-to-hub alice/qwen-finetune

Token resolution order: --token flag → HF_TOKEN env var → HUGGING_FACE_HUB_TOKEN env var → ~/.cache/huggingface/token (from huggingface-cli login).

The model_card.md next to the local export is mirrored as README.md inside the upload so the HF UI renders it as the repo’s model card. Errors carry structured codes (HUB_PUSH_AUTH / HUB_PUSH_NOT_FOUND / HUB_PUSH_NETWORK / HUB_PUSH_VERSION / HUB_PUSH_UNKNOWN) for programmatic triage.