Commit Graph

7 Commits

Author SHA1 Message Date
Juan Camilo Auriti
60d3d8961a fix: add missing o1-series and Ollama models to context window table (#250)
Models not in the lookup table fall through to a 200k default, causing
auto-compact to never trigger for models with smaller actual context
windows. Users hit hard context_window_exceeded errors instead.

Added to both context window and max output token tables:
- o1, o1-mini, o1-preview, o1-pro (OpenAI reasoning models)
- llama3.2:1b, qwen3:8b, codestral (common Ollama models)

Relates to #248
2026-04-06 06:39:24 +08:00
Juan Camilo
b65921e8c3 fix: deterministic prefix matching and correct Llama 3.x context windows
Two fixes in openaiContextWindows.ts:

1. Sort lookup keys by length descending in lookupByModel() so the most
   specific prefix always wins. Without this, 'gpt-4-turbo-preview'
   could match 'gpt-4' (8k) instead of 'gpt-4-turbo' (128k) depending
   on V8's object key iteration order.

2. Update Llama 3.1/3.2/3.3 context windows from 8,192 to 128,000.
   These models support 128k context natively (Meta official specs).
   The previous 8k value was Ollama's default num_ctx, not the model's
   actual capability, causing premature auto-compact warnings.
2026-04-02 15:50:52 +02:00
Juan Camilo
f385740bd6 fix: use isEnvTruthy() for provider detection in context window lookup
Replace raw === '1' || === 'true' comparisons with isEnvTruthy() in
context.ts for consistency with getAPIProvider() in providers.ts.
This also covers the newly added CLAUDE_CODE_USE_GITHUB provider.

Add native Gemini model entries (without google/ prefix) to both
context window and max output token tables. Corrects gemini-2.5-pro
and gemini-2.5-flash max output tokens to 65,536 (was 8,192/32,768).
2026-04-02 14:43:03 +02:00
Kevin Codex
1ce19b9a39 Merge pull request #59 from Vasanthdev2004/gpt4o-max-tokens-test
test: cover OpenAI max token caps for gpt-4o and GPT-5.4
2026-04-02 08:24:25 +08:00
Vasanthdev2004
f0f6f1b285 test: add GPT-5.4 token coverage 2026-04-01 22:07:56 +05:30
Juan Camilo
39d9616ed7 fix: update DeepSeek context window from 64k to 128k
DeepSeek V3 documentation specifies 128k context window for both
deepseek-chat and deepseek-reasoner. The previous 64k value caused
premature compaction and underutilization of available context.

Relates to #39

Co-Authored-By: Juan Camilo <juancamilo.auriti@gmail.com>
2026-04-01 17:03:57 +02:00
gnanam1990
4ca94b2454 feat: add context window guard for OpenAI-compatible models
Without this fix, getContextWindowForModel() returns 200k for all OpenAI
models (the Claude default), causing two problems:
  1. Auto-compact/warnings trigger at wrong thresholds (200k instead of 128k)
  2. getModelMaxOutputTokens() returns 32k causing 400 errors from APIs that
     cap output tokens lower (gpt-4o supports max 16384)

Fix:
- Add openaiContextWindows.ts with known context window sizes and max output
  token limits for 30+ OpenAI-compatible models (OpenAI, DeepSeek, Groq,
  Mistral, Ollama, LM Studio)
- Hook into getContextWindowForModel() so correct input limits are used
- Hook into getModelMaxOutputTokens() so correct output limits are sent,
  preventing 400 "max_tokens is too large" errors

All existing warning, blocking, and auto-compact infrastructure works
automatically once the correct limits are returned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-01 17:42:04 +05:30