feat: add /cache-probe diagnostic command (#580)

Add a /cache-probe slash command for debugging prompt caching behaviour
on OpenAI-compatible providers (GitHub Copilot, OpenAI direct).

The command sends two identical API requests in sequence and compares the
raw server response usage stats, showing:
- Input/output token counts
- Cache read tokens (from prompt_tokens_details or input_tokens_details)
- Latency for each request
- Cache hit rate percentage

Usage:
  /cache-probe                    # test default model
  /cache-probe claude-sonnet-4    # test specific model
  /cache-probe gpt-5.4 --no-key  # test without prompt_cache_key

The --no-key flag omits prompt_cache_key/prompt_cache_retention/store to
test whether the server does content-based auto-caching (it does on
GitHub Copilot).

This is a debugging/diagnostic tool, not intended for regular use. It was
instrumental in discovering that:
1. Copilot auto-caches server-side based on content hash
2. prompt_cache_key is ignored by the proxy
3. The streaming path was not reporting cached tokens

Only enabled when the provider is OpenAI or GitHub (not for firstParty
Anthropic which has different caching semantics).

Related: #515

Co-authored-by: Zartris <14197299+Zartris@users.noreply.github.com>
This commit is contained in:
Zartris
2026-04-10 15:34:38 +02:00
committed by GitHub
parent 598651f423
commit 9ccaa7a675
3 changed files with 432 additions and 0 deletions

View File

@@ -32,6 +32,7 @@ import logout from './commands/logout/index.js'
import installGitHubApp from './commands/install-github-app/index.js'
import installSlackApp from './commands/install-slack-app/index.js'
import breakCache from './commands/break-cache/index.js'
import cacheProbe from './commands/cache-probe/index.js'
import mcp from './commands/mcp/index.js'
import mobile from './commands/mobile/index.js'
import onboarding from './commands/onboarding/index.js'
@@ -268,6 +269,7 @@ const COMMANDS = memoize((): Command[] => [
autoFix,
branch,
btw,
cacheProbe,
chrome,
clear,
color,