Skip to content

Export

result = trainer.export("gguf", quantization="q4_k_m")

To register the exported model with Ollama:

from backpropagate import register_with_ollama
register_with_ollama(result.path, "my-finetuned-model")

Then use it locally:

Terminal window
ollama run my-finetuned-model
Terminal window
backprop export ./output/lora --format gguf --quantization q4_k_m --ollama --ollama-name my-model
QuantizationSizeQualityUse case
q2_kSmallestLowerEmbedded, constrained environments
q4_0SmallFairFast inference, lower quality
q4_k_mSmallGoodGeneral use (recommended)
q5_k_mMediumBetterBalance of size and quality
q8_0LargeHighWhen quality matters more than size
f16LargestHighestMaximum quality, no compression

Backpropagate supports three export formats via trainer.export(format=...):

FormatDescriptionUse case
loraLoRA adapter only (default)Smallest output, requires base model at inference
mergedBase model + adapter mergedStandalone model, larger but self-contained
ggufQuantized GGUF fileFor Ollama, llama.cpp, and LM Studio
  1. Merges LoRA weights back into the base model
  2. Converts to GGUF format (via Unsloth if available, otherwise llama.cpp)
  3. Applies the chosen quantization level
  4. Optionally creates an Ollama Modelfile and registers the model

Use create_modelfile() to build an Ollama Modelfile with a custom system prompt, temperature, or context length before registering:

from backpropagate import create_modelfile, register_with_ollama
modelfile_path = create_modelfile(
"output/gguf/model-q4_k_m.gguf",
system_prompt="You are a helpful coding assistant.",
temperature=0.5,
context_length=8192,
)

If you only need the default Modelfile, register_with_ollama() creates one automatically.

After registering one or more fine-tuned models, list them from Python:

from backpropagate import list_ollama_models
for model in list_ollama_models():
print(model)

This calls ollama list under the hood and returns the model names.