CW Context Window Manager
Python · MCP · vLLM + LMCache

Lossless context. Zero information loss.

Context Window Manager is an MCP server that solves the context exhaustion problem. Freeze your KV cache tensors to persistent storage and thaw them later with exact restoration — no summarization, no RAG, no missing details.

Install

pip install cwm-mcp # Optional extras pip install cwm-mcp[redis] # distributed storage pip install cwm-mcp[lmcache] # LMCache integration pip install cwm-mcp[encryption] # encrypted-at-rest pip install cwm-mcp[all] # everything

Claude Code config

// .claude/settings.json { "mcpServers": { "context-window-manager": { "command": "python", "args": ["-m", "context_window_manager"], "env": { "CWM_VLLM_URL": "http://localhost:8000" } } } }

Core tools

# Freeze your current session > window_freeze session_abc123 my-project # Restore it exactly as it was > window_thaw my-project # Branch for exploration > window_clone my-project my-project-v2 # Check status > window_status my-project

Not just another memory tool

Actual KV tensors. Actual restoration. No approximation.

True lossless restoration

Preserves actual KV cache tensors — not summaries, not embeddings. Thaw a frozen window and the model resumes from the exact same cognitive state.

Tiered storage

CPU memory → Disk → Redis. CWM promotes and demotes automatically across tiers. Fast restore when warm, reliable restore when cold, shared restore with Redis.

Session isolation

Every session gets a unique cache_salt. No cross-session data leakage, no timing attacks, clean separation between concurrent contexts.

MCP tools

Six operations over your context windows, exposed as MCP tools.

Tool
Description
window_freeze
Snapshot session context to persistent storage
window_thaw
Restore context from a saved window
window_list
List all available context windows
window_status
Get detailed session and window info
window_clone
Branch a context for parallel exploration
window_delete
Remove a saved window and free storage

Get started

Install & run

pip install cwm-mcp

# Start the MCP server
python -m context_window_manager

# Or with explicit vLLM URL
CWM_VLLM_URL=http://localhost:8000 \
  python -m context_window_manager

vLLM server setup

# Enable prefix caching + LMCache connector
vllm serve "meta-llama/Llama-3.1-8B-Instruct" \
  --enable-prefix-caching \
  --kv-transfer-config '{
    "kv_connector":"LMCacheConnectorV1",
    "kv_role":"kv_both"
  }'

LMCache environment

# Configure LMCache tiers
export LMCACHE_USE_EXPERIMENTAL=True
export LMCACHE_LOCAL_CPU=True
export LMCACHE_MAX_LOCAL_CPU_SIZE=8.0

# Optional: Redis for distributed storage
export LMCACHE_REDIS_URL=redis://localhost:6379

Development setup

git clone https://github.com/mcp-tool-shop-org/context-window-manager
cd context-window-manager
python -m venv .venv
source .venv/bin/activate  # Linux/macOS
# .venv\Scripts\activate   # Windows
pip install -e '.[dev]'
pytest tests/unit/

Built for production

7 phases shipped. 366 tests passing. Beta-stable.

366 tests

Full async coverage with pytest-asyncio, property-based tests via Hypothesis, performance benchmarks, and integration tests. Run the unit suite in seconds.

7 completed phases

Core infrastructure → MCP server shell → Freeze → Thaw → Advanced features (clone, auto-freeze) → Production hardening → Integration and polish.

vLLM + LMCache stack

Built on proven inference infrastructure. Integrates with vLLM prefix caching and LMCache's tiered persistence — no custom tensor plumbing required.