CLI reference
Commands
Section titled “Commands”bytefit probe
Section titled “bytefit probe”Detect the GPU, VRAM, RAM, and (optionally) measured NVMe bandwidth, and print the hardware profile.
bytefit recommend
Section titled “bytefit recommend”Rank the available models best-first for the detected hardware, dropping any that can’t run interactively.
bytefit plan <model-id>
Section titled “bytefit plan <model-id>”Plan one model: choose quant + KV-cache + context + offload, predict tok/s, and emit ready-to-run
arguments — or refuse with a structured reason. An exact model id wins; an unambiguous prefix also
resolves (an ambiguous one lists the candidates and exits 2).
| Flag | Applies to | Meaning |
|---|---|---|
--json | all | Machine-readable JSON output |
--dir <path> | recommend, plan | Also scan a folder of .gguf files |
--hf <repo> | recommend, plan | Also rank a Hugging Face GGUF repo without downloading it (opt-in network) |
--ctx <n> | recommend, plan | Context length in tokens (default 8192) |
--use-case <c> | recommend, plan | reasoning | chat | bulk — gates the quant floor |
--backend <b> | plan | llama.cpp | ollama | lmstudio (default llama.cpp) |
--experimental | recommend, plan | Allow the experimental MoE disk-streaming tier (MoE only) |
-h, --help | — | Show help |
Exit codes
Section titled “Exit codes”| Code | Meaning |
|---|---|
0 | OK |
1 | Model not found, or the loadout was refused |
2 | Usage error (bad command, ambiguous model, unknown backend or --use-case, out-of-range --ctx) |
Model sources
Section titled “Model sources”bytefit reads installed Ollama models by default (via OLLAMA_HOST, default
http://127.0.0.1:11434). Add a local folder of .gguf files with --dir <path>, and a Hugging Face
GGUF repo with --hf <repo> — the latter reads only the GGUF header over HTTPS and never downloads the
weights.