Add a /cache-probe slash command for debugging prompt caching behaviour on OpenAI-compatible providers (GitHub Copilot, OpenAI direct). The command sends two identical API requests in sequence and compares the raw server response usage stats, showing: - Input/output token counts - Cache read tokens (from prompt_tokens_details or input_tokens_details) - Latency for each request - Cache hit rate percentage Usage: /cache-probe # test default model /cache-probe claude-sonnet-4 # test specific model /cache-probe gpt-5.4 --no-key # test without prompt_cache_key The --no-key flag omits prompt_cache_key/prompt_cache_retention/store to test whether the server does content-based auto-caching (it does on GitHub Copilot). This is a debugging/diagnostic tool, not intended for regular use. It was instrumental in discovering that: 1. Copilot auto-caches server-side based on content hash 2. prompt_cache_key is ignored by the proxy 3. The streaming path was not reporting cached tokens Only enabled when the provider is OpenAI or GitHub (not for firstParty Anthropic which has different caching semantics). Related: #515 Co-authored-by: Zartris <14197299+Zartris@users.noreply.github.com>
18 lines
535 B
TypeScript
18 lines
535 B
TypeScript
import type { Command } from '../../commands.js'
|
|
import { isEnvTruthy } from '../../utils/envUtils.js'
|
|
|
|
const cacheProbe: Command = {
|
|
type: 'local',
|
|
name: 'cache-probe',
|
|
description:
|
|
'Send identical requests to test prompt caching (results in debug log)',
|
|
argumentHint: '[model] [--no-key]',
|
|
isEnabled: () =>
|
|
isEnvTruthy(process.env.CLAUDE_CODE_USE_OPENAI) ||
|
|
isEnvTruthy(process.env.CLAUDE_CODE_USE_GITHUB),
|
|
supportsNonInteractive: false,
|
|
load: () => import('./cache-probe.js'),
|
|
}
|
|
|
|
export default cacheProbe
|