feat: add /cache-probe diagnostic command (#580)

Add a /cache-probe slash command for debugging prompt caching behaviour on OpenAI-compatible providers (GitHub Copilot, OpenAI direct). The command sends two identical API requests in sequence and compares the raw server response usage stats, showing: - Input/output token counts - Cache read tokens (from prompt_tokens_details or input_tokens_details) - Latency for each request - Cache hit rate percentage Usage: /cache-probe # test default model /cache-probe claude-sonnet-4 # test specific model /cache-probe gpt-5.4 --no-key # test without prompt_cache_key The --no-key flag omits prompt_cache_key/prompt_cache_retention/store to test whether the server does content-based auto-caching (it does on GitHub Copilot). This is a debugging/diagnostic tool, not intended for regular use. It was instrumental in discovering that: 1. Copilot auto-caches server-side based on content hash 2. prompt_cache_key is ignored by the proxy 3. The streaming path was not reporting cached tokens Only enabled when the provider is OpenAI or GitHub (not for firstParty Anthropic which has different caching semantics). Related: #515 Co-authored-by: Zartris <14197299+Zartris@users.noreply.github.com>
2026-04-10 15:34:38 +02:00
parent 598651f423
commit 9ccaa7a675
3 changed files with 432 additions and 0 deletions
--- a/src/commands.ts
+++ b/src/commands.ts
@@ -32,6 +32,7 @@ import logout from './commands/logout/index.js'
 import installGitHubApp from './commands/install-github-app/index.js'
 import installSlackApp from './commands/install-slack-app/index.js'
 import breakCache from './commands/break-cache/index.js'
+import cacheProbe from './commands/cache-probe/index.js'
 import mcp from './commands/mcp/index.js'
 import mobile from './commands/mobile/index.js'
 import onboarding from './commands/onboarding/index.js'
@@ -268,6 +269,7 @@ const COMMANDS = memoize((): Command[] => [
  autoFix,
  branch,
  btw,
+  cacheProbe,
  chrome,
  clear,
  color,