feat(api): expose cache metrics in REPL + normalize across providers (#813)

* feat(api): expose cache metrics in REPL + /cache-stats command * fix(api): normalize Kimi/DeepSeek/Gemini cache fields through shim layer * test(api): cover /cache-stats rendering + fix CacheMetrics docstring drift * fix(api): always reset cache turn counter + include date in /cache-stats rows * refactor(api): unify shim usage builder + add cost-tracker wiring test * fix(api): classify private-IP/self-hosted OpenAI endpoints as N/A instead of cold * fix(api): require colon guard on IPv6 ULA prefix to avoid public-host over-match * perf(api): ring buffer for cache history + hit rate clamp + .localhost TLD * fix(api): null guards on formatters + document Codex Responses API shape * fix(api): defensive start-of-turn reset + config gate fallback + env var docs * fix(api): trust forwarded cache data on self-hosted URLs (data-driven) * refactor(api): delegate streaming Responses usage to shared makeUsage helper
2026-04-25 01:38:25 -03:00
parent 9070220292
commit 9e23c2bec4
20 changed files with 2749 additions and 46 deletions
--- a/.env.example
+++ b/.env.example
@@ -299,6 +299,20 @@ ANTHROPIC_API_KEY=sk-ant-your-key-here
 # Useful for users who want full transparency over what the model sees
 # OPENCLAUDE_DISABLE_TOOL_REMINDERS=1

+# Log structured per-request token usage (including cache metrics) to stderr.
+# Useful for auditing cache hit rate / debugging cost spikes outside the REPL.
+# Any truthy value enables it ("verbose", "1", "true").
+#
+# Complements (does NOT replace) CLAUDE_CODE_ENABLE_TOKEN_USAGE_ATTACHMENT —
+# they serve different audiences:
+#   - OPENCLAUDE_LOG_TOKEN_USAGE is user-facing: one JSON line per API
+#     request on stderr, intended for humans inspecting cost/caching.
+#   - CLAUDE_CODE_ENABLE_TOKEN_USAGE_ATTACHMENT is model-facing: injects
+#     a context-usage attachment INTO the prompt so the model can reason
+#     about its own remaining context. Does not touch stderr.
+# Turn on whichever audience you're debugging; both can run together.
+# OPENCLAUDE_LOG_TOKEN_USAGE=verbose
+
 # Custom timeout for API requests in milliseconds (default: varies)
 # API_TIMEOUT_MS=60000