Run the biggest model your machine can actually handle.
bytefit is a hardware-aware loadout planner for local LLMs. It picks the model, quant, KV-cache, context, and offload policy for your VRAM and RAM — and refuses any config that would silently page to disk.
Probe
bytefit probe
Recommend
bytefit recommend
Plan
bytefit plan qwen3.6:27b
Features
A planner, not just an estimator.
Closes the decision loop
Not just "does it fit" — bytefit chooses model class, quant family, KV-cache type, context length, and offload policy for your exact hardware.
Refuses to page
Involuntary disk paging collapses decode throughput by ~78×. bytefit checks footprint against memory and refuses — with a structured reason and a non-zero exit code — rather than launch a thrashing job.
Ready-to-run output
Emits llama.cpp, Ollama, and LM Studio arguments (including fractional MoE expert offload) plus a predicted tok/s. Zero runtime dependencies.
Usage
Install
npm install -g @mcptoolshop/bytefit See your hardware
bytefit probe Rank your models
bytefit recommend Plan one model
bytefit plan qwen3.6:27b --backend llama.cpp