Lossless context. Zero information loss.
Context Window Manager is an MCP server that solves the context exhaustion problem. Freeze your KV cache tensors to persistent storage and thaw them later with exact restoration — no summarization, no RAG, no missing details.
Install
pip install cwm-mcp
# Optional extras
pip install cwm-mcp[redis] # distributed storage
pip install cwm-mcp[lmcache] # LMCache integration
pip install cwm-mcp[encryption] # encrypted-at-rest
pip install cwm-mcp[all] # everything
Claude Code config
// .claude/settings.json
{
"mcpServers": {
"context-window-manager": {
"command": "python",
"args": ["-m", "context_window_manager"],
"env": {
"CWM_VLLM_URL": "http://localhost:8000"
}
}
}
}
Core tools
# Freeze your current session
> window_freeze session_abc123 my-project
# Restore it exactly as it was
> window_thaw my-project
# Branch for exploration
> window_clone my-project my-project-v2
# Check status
> window_status my-project
Not just another memory tool
Actual KV tensors. Actual restoration. No approximation.
True lossless restoration
Preserves actual KV cache tensors — not summaries, not embeddings. Thaw a frozen window and the model resumes from the exact same cognitive state.
Tiered storage
CPU memory → Disk → Redis. CWM promotes and demotes automatically across tiers. Fast restore when warm, reliable restore when cold, shared restore with Redis.
Session isolation
Every session gets a unique cache_salt. No cross-session data leakage, no timing attacks, clean separation between concurrent contexts.
MCP tools
Six operations over your context windows, exposed as MCP tools.
Get started
Install & run
pip install cwm-mcp
# Start the MCP server
python -m context_window_manager
# Or with explicit vLLM URL
CWM_VLLM_URL=http://localhost:8000 \
python -m context_window_manager vLLM server setup
# Enable prefix caching + LMCache connector
vllm serve "meta-llama/Llama-3.1-8B-Instruct" \
--enable-prefix-caching \
--kv-transfer-config '{
"kv_connector":"LMCacheConnectorV1",
"kv_role":"kv_both"
}' LMCache environment
# Configure LMCache tiers
export LMCACHE_USE_EXPERIMENTAL=True
export LMCACHE_LOCAL_CPU=True
export LMCACHE_MAX_LOCAL_CPU_SIZE=8.0
# Optional: Redis for distributed storage
export LMCACHE_REDIS_URL=redis://localhost:6379 Development setup
git clone https://github.com/mcp-tool-shop-org/context-window-manager
cd context-window-manager
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
pip install -e '.[dev]'
pytest tests/unit/ Built for production
7 phases shipped. 366 tests passing. Beta-stable.
366 tests
Full async coverage with pytest-asyncio, property-based tests via Hypothesis, performance benchmarks, and integration tests. Run the unit suite in seconds.
7 completed phases
Core infrastructure → MCP server shell → Freeze → Thaw → Advanced features (clone, auto-freeze) → Production hardening → Integration and polish.
vLLM + LMCache stack
Built on proven inference infrastructure. Integrates with vLLM prefix caching and LMCache's tiered persistence — no custom tensor plumbing required.