Architecture
このコンテンツはまだ日本語訳がありません。
polyglot-gpu is organized as 12 Python modules, each with a single responsibility.
Module map
Section titled “Module map”MCP Client (Claude Code, etc.) │ MCP protocol (stdio) ▼┌──────────────────┐│ server.py │ 5 MCP tools (FastMCP)├──────────────────┤│ translate.py │ Chunking, batching, streaming│ markdown.py │ Markdown segmentation + cleanup│ translate_all │ Multi-language orchestrator│ semaphore.py │ GPU-safe concurrency control│ validate.py │ Output validation├──────────────────┤│ ollama.py │ httpx pooled client → Ollama│ cache.py │ Segment cache + fuzzy memory│ glossary.py │ Software term dictionary│ languages.py │ 57 language definitions│ polish.py │ Post-translation cleanup│ errors.py │ Structured error codes└──────────────────┘ │ HTTP (httpx) ▼ Ollama + TranslateGemma (GPU)Request flow
Section titled “Request flow”- server.py receives an MCP tool call and dispatches to the appropriate handler
- translate.py splits text into chunks, builds prompts with glossary hints, and calls Ollama
- ollama.py sends the request via a pooled httpx connection to
localhost:11434 - validate.py checks the response for quality issues (language leakage, truncation)
- polish.py cleans up common model artifacts (stray quotes, formatting noise)
- cache.py stores the segment result for future fuzzy matching
Key design decisions
Section titled “Key design decisions”Connection pooling
Section titled “Connection pooling”The Ollama client reuses a single httpx.AsyncClient across all requests instead of creating a new connection per call. This reduces TCP handshake overhead and improves throughput for batch translations.
Semaphore concurrency
Section titled “Semaphore concurrency”A custom Semaphore class wraps asyncio.Semaphore with an active-count tracker protected by asyncio.Lock. This prevents multiple GPU-bound requests from exhausting VRAM. The semaphore supports both explicit acquire/release and async with context manager patterns.
Markdown segmentation
Section titled “Markdown segmentation”markdown.py splits markdown into translatable segments and structural segments. Code blocks, HTML tags, URLs, and badges are classified as structural and passed through untranslated. Only prose segments go through the translation pipeline.
Fuzzy cache matching
Section titled “Fuzzy cache matching”cache.py uses Levenshtein distance to match incoming segments against cached translations. This handles minor text edits (typo fixes, punctuation changes) without re-translating the entire segment.
Structured errors
Section titled “Structured errors”errors.py defines PolyglotError with typed error codes, human-readable messages, optional hints, and a retryable flag. The friendly_error function converts any exception into a user-facing message.
Test coverage
Section titled “Test coverage”100 tests across 10 test files covering all modules. Tests use pytest with pytest-asyncio for async test support.