feat(api): expose cache metrics in REPL + normalize across providers (#813)

* feat(api): expose cache metrics in REPL + /cache-stats command

* fix(api): normalize Kimi/DeepSeek/Gemini cache fields through shim layer

* test(api): cover /cache-stats rendering + fix CacheMetrics docstring drift

* fix(api): always reset cache turn counter + include date in /cache-stats rows

* refactor(api): unify shim usage builder + add cost-tracker wiring test

* fix(api): classify private-IP/self-hosted OpenAI endpoints as N/A instead of cold

* fix(api): require colon guard on IPv6 ULA prefix to avoid public-host over-match

* perf(api): ring buffer for cache history + hit rate clamp + .localhost TLD

* fix(api): null guards on formatters + document Codex Responses API shape

* fix(api): defensive start-of-turn reset + config gate fallback + env var docs

* fix(api): trust forwarded cache data on self-hosted URLs (data-driven)

* refactor(api): delegate streaming Responses usage to shared makeUsage helper
This commit is contained in:
viudes
2026-04-25 01:38:25 -03:00
committed by GitHub
parent 9070220292
commit 9e23c2bec4
20 changed files with 2749 additions and 46 deletions

View File

@@ -299,6 +299,20 @@ ANTHROPIC_API_KEY=sk-ant-your-key-here
# Useful for users who want full transparency over what the model sees
# OPENCLAUDE_DISABLE_TOOL_REMINDERS=1
# Log structured per-request token usage (including cache metrics) to stderr.
# Useful for auditing cache hit rate / debugging cost spikes outside the REPL.
# Any truthy value enables it ("verbose", "1", "true").
#
# Complements (does NOT replace) CLAUDE_CODE_ENABLE_TOKEN_USAGE_ATTACHMENT —
# they serve different audiences:
# - OPENCLAUDE_LOG_TOKEN_USAGE is user-facing: one JSON line per API
# request on stderr, intended for humans inspecting cost/caching.
# - CLAUDE_CODE_ENABLE_TOKEN_USAGE_ATTACHMENT is model-facing: injects
# a context-usage attachment INTO the prompt so the model can reason
# about its own remaining context. Does not touch stderr.
# Turn on whichever audience you're debugging; both can run together.
# OPENCLAUDE_LOG_TOKEN_USAGE=verbose
# Custom timeout for API requests in milliseconds (default: varies)
# API_TIMEOUT_MS=60000