Features
Auto-start and auto-pull
Section titled “Auto-start and auto-pull”Ollama is automatically started if it is not running. On Windows, Polyglot checks known install locations (%LOCALAPPDATA%\Programs\Ollama, %LOCALAPPDATA%\Ollama) before falling back to the system PATH. If Ollama starts successfully, Polyglot waits up to 10 seconds for it to become available.
The TranslateGemma model is automatically pulled if it is not installed. Pull progress is streamed to stderr so you can see download status. Zero manual setup required after the initial Ollama installation.
Retry with exponential backoff
Section titled “Retry with exponential backoff”Transient Ollama failures (network blips, temporary overload, timeouts) are automatically retried up to 2 times with exponential backoff (1s, 2s). Non-retryable errors (bad model name, unsupported language, same source/target language) fail immediately without wasting retry attempts.
Smart chunking
Section titled “Smart chunking”Long text is split at natural boundaries — paragraphs first, then sentences, then any newline — so translation context is preserved. Chunk sizes adapt to the model capacity:
| Model | Chunk size |
|---|---|
| 2B / 4B | 2,000 chars |
| 12B | 4,000 chars |
| 27B | 6,000 chars |
If no good split point is found within the first 30% of the chunk limit, the text is hard-split at the maximum size.
Segment cache
Section titled “Segment cache”Translated segments are cached by content hash (SHA-256 of source text + target language + model, truncated to 16 hex characters). Unchanged segments skip re-translation entirely. The cache lives in .polyglot-cache.json alongside the source file with a 30-day TTL. Expired entries are pruned automatically on load.
Translation memory (fuzzy cache)
Section titled “Translation memory (fuzzy cache)”When an exact cache hit is not found, Polyglot checks for near-miss segments using Levenshtein similarity (Wagner-Fischer algorithm with single-row optimization). If a cached source is 85% or more similar to the current segment, the existing translation is reused. This dramatically speeds up retranslation after minor README edits. Fuzzy matching only considers entries that share the same target language and model to prevent cross-language contamination.
Concurrency semaphore
Section titled “Concurrency semaphore”All Ollama calls are guarded by a counting semaphore (default limit: 1) to prevent GPU OOM on systems with limited VRAM. The semaphore queues excess requests and releases them in order. Override with POLYGLOT_CONCURRENCY:
POLYGLOT_CONCURRENCY=2 npx @mcptoolshop/polyglot-mcpThe translate_all tool adds a second layer of concurrency control via its own semaphore (default: 2, max: 3 concurrent languages).
MCP progress tokens
Section titled “MCP progress tokens”All five tools report progress via MCP notifications/progress when the client provides a progressToken. The translate tool reports per-chunk, translate_markdown reports per-segment-batch, translate_all reports per-language, and check_status reports per-step (Ollama check, model list, ready).
Software glossary
Section titled “Software glossary”A built-in glossary of 12 technical terms (Architecture, Adoption, Pipeline, Deploy, Library, Framework, Build, Release, Branch, Repository, Merge, Token) ensures consistent translation of software terminology. Each term has target-language translations for languages where TranslateGemma commonly picks the wrong domain sense (e.g., “Architecture” as building design instead of software architecture). Custom glossary entries can be passed per-request via the glossary parameter and are merged with the defaults.
Glossary hints are injected into the prompt only when the source text actually contains the term and a translation exists for the target language.
Batch translation
Section titled “Batch translation”The translateBatch function groups multiple segments into a single prompt separated by ---POLYGLOT_SEP--- markers, reducing Ollama round-trips. If the model mangles the separators (output segment count does not match input), Polyglot falls back to translating each item individually. Single-item batches skip the separator overhead entirely.
Output validation
Section titled “Output validation”Every translation is automatically validated:
- Empty output throws a retryable
TRANSLATE_ERROR - Source-text echo is flagged (skipped for short/technical strings under 15 characters)
- Severe truncation warns when output is below 15% of source length
- Hallucination blowup warns when output exceeds 800% of source length
- Garbled encoding is detected when replacement/control characters exceed 5% of the output
- Model meta-commentary is detected (e.g., “Here is the translation:”, “Note:”, “Disclaimer:”)
Warnings appear in the MCP tool response alongside the translated text.
Streaming
Section titled “Streaming”OllamaClient.generateStream() reads NDJSON chunks from Ollama as they arrive and fires an onToken callback for each one. The full response is accumulated and returned when the stream ends. Both streaming and non-streaming paths share the same retry and semaphore logic.
Structured errors
Section titled “Structured errors”All errors use PolyglotError with a machine-readable code, a human-readable message, an optional hint, and a retryable flag. Error codes include: OLLAMA_UNAVAILABLE, OLLAMA_TIMEOUT, MODEL_NOT_FOUND, MODEL_PULL_FAILED, UNSUPPORTED_LANGUAGE, SAME_LANGUAGE, NETWORK_ERROR, OLLAMA_ERROR, TRANSLATE_ERROR. The friendlyError() helper converts any thrown value into a user-friendly string for MCP tool responses.
Post-translation polish
Section titled “Post-translation polish”The polish layer cleans up common TranslateGemma output quirks: alternative translations separated by “or” in 12 languages are stripped (keeping only the first translation), trailing sentence-end punctuation is removed from headings, and excessive whitespace is normalized. Table cells get additional polish to collapse embedded newlines into spaces.