fix(test): autoCompact floor assertion is flag-sensitive (#816)

The test "never returns negative even for unknown 3P models (issue #635)"
asserted that getEffectiveContextWindowSize() returns >= 33_000 for an
unknown 3P model under the OpenAI shim. That specific number assumes
reservedTokensForSummary = 20_000 (MAX_OUTPUT_TOKENS_FOR_SUMMARY), which
holds only when the tengu_otk_slot_v1 GrowthBook flag is disabled.

When the flag is ON — which is the case in CI but not always locally —
getMaxOutputTokensForModel() caps the model's default output at
CAPPED_DEFAULT_MAX_TOKENS (8_000). Then reservedTokensForSummary = 8_000,
floor = 8_000 + 13_000 = 21_000, and the test fails with 21_000 < 33_000.

The test reliably passes locally and reliably fails in CI, manifesting as
the intermittent PR-check failure.

Fix: relax the lower bound to 21_000 (cap-enabled worst case), which is
still well above zero — preserving the anti-regression intent of
issue #635 (no infinite auto-compact from a negative effective window)
without binding the test to GrowthBook flag state.

Co-authored-by: OpenClaude <openclaude@gitlawb.com>
This commit is contained in:
Kevin Codex
2026-04-22 09:37:57 +08:00
committed by GitHub
parent 458120889f
commit c13842e91c

View File

@@ -16,12 +16,21 @@ describe('getEffectiveContextWindowSize', () => {
// 8k minus 20k summary reservation = -12k, causing infinite auto-compact. // 8k minus 20k summary reservation = -12k, causing infinite auto-compact.
// Now the fallback is 128k and there's a floor, so effective is always // Now the fallback is 128k and there's a floor, so effective is always
// at least reservedTokensForSummary + buffer. // at least reservedTokensForSummary + buffer.
//
// The exact floor depends on the max-output-tokens slot-reservation cap
// (tengu_otk_slot_v1 GrowthBook flag). With cap enabled, the model's
// default output cap drops to CAPPED_DEFAULT_MAX_TOKENS (8k), so the
// summary reservation is 8k and the floor is 8k + 13k = 21k. With cap
// disabled it's 20k + 13k = 33k. Assert the worst case so the test is
// stable regardless of flag state in CI vs local.
process.env.CLAUDE_CODE_USE_OPENAI = '1' process.env.CLAUDE_CODE_USE_OPENAI = '1'
try { try {
const effective = getEffectiveContextWindowSize('some-unknown-3p-model') const effective = getEffectiveContextWindowSize('some-unknown-3p-model')
expect(effective).toBeGreaterThan(0) expect(effective).toBeGreaterThan(0)
// Must be at least summary reservation (20k) + buffer (13k) = 33k // 21k = CAPPED_DEFAULT_MAX_TOKENS (8k) + AUTOCOMPACT_BUFFER_TOKENS (13k).
expect(effective).toBeGreaterThanOrEqual(33_000) // Covers the anti-regression intent of issue #635 without assuming
// the GrowthBook flag state.
expect(effective).toBeGreaterThanOrEqual(21_000)
} finally { } finally {
delete process.env.CLAUDE_CODE_USE_OPENAI delete process.env.CLAUDE_CODE_USE_OPENAI
} }