Open source · zero deps

Run the biggest model your machine can actually handle.

bytefit is a hardware-aware loadout planner for local LLMs. It picks the model, quant, KV-cache, context, and offload policy for your VRAM and RAM — and refuses any config that would silently page to disk.

Get started Read the Handbook

Probe

bytefit probe

Recommend

bytefit recommend

Plan

bytefit plan qwen3.6:27b

Features

A planner, not just an estimator.

Closes the decision loop

Not just "does it fit" — bytefit chooses model class, quant family, KV-cache type, context length, and offload policy for your exact hardware.

Refuses to page

Involuntary disk paging collapses decode throughput by ~78×. bytefit checks footprint against memory and refuses — with a structured reason and a non-zero exit code — rather than launch a thrashing job.

Ready-to-run output

Emits llama.cpp, Ollama, and LM Studio arguments (including fractional MoE expert offload) plus a predicted tok/s. Zero runtime dependencies.

Usage

Install

npm install -g @mcptoolshop/bytefit

See your hardware

bytefit probe

Rank your models

bytefit recommend

Plan one model

bytefit plan qwen3.6:27b --backend llama.cpp