- Raise context window fallback from 8k to 128k for unknown OpenAI-compat models.
The 8k fallback caused effective context (8k minus output reservation) to go
negative, making auto-compact fire on every single message.
- Add safety floor in getEffectiveContextWindowSize(): effective context is
always at least reservedTokensForSummary + 13k buffer, ensuring the
auto-compact threshold stays positive.
- Add missing MiniMax model entries (M2.5, M2.5-highspeed, M2.1, M2.1-highspeed)
all at 204,800 context / 131,072 max output per MiniMax docs.
- Add tests for MiniMax variants, 128k fallback, and autoCompact floor.
Fixes#635
Co-authored-by: root <root@vm7508.lumadock.com>
The OPENAI_CONTEXT_WINDOWS/OPENAI_MAX_OUTPUT_TOKENS tables only contained
the `github:copilot:<model>` namespaced form used when talking directly to
Copilot via /onboard-github. When OpenClaude is pointed at a LiteLLM proxy
(which routes Copilot using the standard `github_copilot/<model>` convention),
the lookup missed and fell back to the conservative 8k default — causing the
compaction loop to fire repeatedly on every tick and blocking requests
before they left the client with repeated "not in context window table"
warnings on stderr.
Mirror the 11 active Copilot models with LiteLLM-style keys in both tables.
No behavior change for users of /onboard-github since namespaced entries
remain untouched and `lookupByKey` picks exact matches first.
Add context_window and max_output_tokens entries for all models available
through the GitHub Copilot proxy (Claude, GPT, Gemini, Grok), sourced from
https://api.githubcopilot.com/models.
Models are namespaced as "github:copilot:<model>" to avoid collisions with
the same model names served by other providers (which may have different
limits). A new lookupByKey() helper and qualified-key lookup in
lookupByModel() ensures the correct limits are selected when
OPENAI_MODEL=github:copilot.
Without this, Claude models on Copilot would use default context/output
limits that may not match the proxy's actual constraints, causing 400 errors
like "max_tokens is too large".
Related: #515
Co-authored-by: Zartris <14197299+Zartris@users.noreply.github.com>
Models not in the lookup table fall through to a 200k default, causing
auto-compact to never trigger for models with smaller actual context
windows. Users hit hard context_window_exceeded errors instead.
Added to both context window and max output token tables:
- o1, o1-mini, o1-preview, o1-pro (OpenAI reasoning models)
- llama3.2:1b, qwen3:8b, codestral (common Ollama models)
Relates to #248
Two fixes in openaiContextWindows.ts:
1. Sort lookup keys by length descending in lookupByModel() so the most
specific prefix always wins. Without this, 'gpt-4-turbo-preview'
could match 'gpt-4' (8k) instead of 'gpt-4-turbo' (128k) depending
on V8's object key iteration order.
2. Update Llama 3.1/3.2/3.3 context windows from 8,192 to 128,000.
These models support 128k context natively (Meta official specs).
The previous 8k value was Ollama's default num_ctx, not the model's
actual capability, causing premature auto-compact warnings.
Replace raw === '1' || === 'true' comparisons with isEnvTruthy() in
context.ts for consistency with getAPIProvider() in providers.ts.
This also covers the newly added CLAUDE_CODE_USE_GITHUB provider.
Add native Gemini model entries (without google/ prefix) to both
context window and max output token tables. Corrects gemini-2.5-pro
and gemini-2.5-flash max output tokens to 65,536 (was 8,192/32,768).
DeepSeek V3 documentation specifies 128k context window for both
deepseek-chat and deepseek-reasoner. The previous 64k value caused
premature compaction and underutilization of available context.
Relates to #39
Co-Authored-By: Juan Camilo <juancamilo.auriti@gmail.com>
Without this fix, getContextWindowForModel() returns 200k for all OpenAI
models (the Claude default), causing two problems:
1. Auto-compact/warnings trigger at wrong thresholds (200k instead of 128k)
2. getModelMaxOutputTokens() returns 32k causing 400 errors from APIs that
cap output tokens lower (gpt-4o supports max 16384)
Fix:
- Add openaiContextWindows.ts with known context window sizes and max output
token limits for 30+ OpenAI-compatible models (OpenAI, DeepSeek, Groq,
Mistral, Ollama, LM Studio)
- Hook into getContextWindowForModel() so correct input limits are used
- Hook into getModelMaxOutputTokens() so correct output limits are sent,
preventing 400 "max_tokens is too large" errors
All existing warning, blocking, and auto-compact infrastructure works
automatically once the correct limits are returned.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>