From c13842e91c7227246520955de6ae0636b30def9a Mon Sep 17 00:00:00 2001 From: Kevin Codex Date: Wed, 22 Apr 2026 09:37:57 +0800 Subject: [PATCH] fix(test): autoCompact floor assertion is flag-sensitive (#816) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The test "never returns negative even for unknown 3P models (issue #635)" asserted that getEffectiveContextWindowSize() returns >= 33_000 for an unknown 3P model under the OpenAI shim. That specific number assumes reservedTokensForSummary = 20_000 (MAX_OUTPUT_TOKENS_FOR_SUMMARY), which holds only when the tengu_otk_slot_v1 GrowthBook flag is disabled. When the flag is ON — which is the case in CI but not always locally — getMaxOutputTokensForModel() caps the model's default output at CAPPED_DEFAULT_MAX_TOKENS (8_000). Then reservedTokensForSummary = 8_000, floor = 8_000 + 13_000 = 21_000, and the test fails with 21_000 < 33_000. The test reliably passes locally and reliably fails in CI, manifesting as the intermittent PR-check failure. Fix: relax the lower bound to 21_000 (cap-enabled worst case), which is still well above zero — preserving the anti-regression intent of issue #635 (no infinite auto-compact from a negative effective window) without binding the test to GrowthBook flag state. Co-authored-by: OpenClaude --- src/services/compact/autoCompact.test.ts | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/src/services/compact/autoCompact.test.ts b/src/services/compact/autoCompact.test.ts index 20248c70..b0decd3a 100644 --- a/src/services/compact/autoCompact.test.ts +++ b/src/services/compact/autoCompact.test.ts @@ -16,12 +16,21 @@ describe('getEffectiveContextWindowSize', () => { // 8k minus 20k summary reservation = -12k, causing infinite auto-compact. // Now the fallback is 128k and there's a floor, so effective is always // at least reservedTokensForSummary + buffer. + // + // The exact floor depends on the max-output-tokens slot-reservation cap + // (tengu_otk_slot_v1 GrowthBook flag). With cap enabled, the model's + // default output cap drops to CAPPED_DEFAULT_MAX_TOKENS (8k), so the + // summary reservation is 8k and the floor is 8k + 13k = 21k. With cap + // disabled it's 20k + 13k = 33k. Assert the worst case so the test is + // stable regardless of flag state in CI vs local. process.env.CLAUDE_CODE_USE_OPENAI = '1' try { const effective = getEffectiveContextWindowSize('some-unknown-3p-model') expect(effective).toBeGreaterThan(0) - // Must be at least summary reservation (20k) + buffer (13k) = 33k - expect(effective).toBeGreaterThanOrEqual(33_000) + // 21k = CAPPED_DEFAULT_MAX_TOKENS (8k) + AUTOCOMPACT_BUFFER_TOKENS (13k). + // Covers the anti-regression intent of issue #635 without assuming + // the GrowthBook flag state. + expect(effective).toBeGreaterThanOrEqual(21_000) } finally { delete process.env.CLAUDE_CODE_USE_OPENAI }