Commit Graph

3 Commits

Author SHA1 Message Date
viudes
9e23c2bec4 feat(api): expose cache metrics in REPL + normalize across providers (#813)
* feat(api): expose cache metrics in REPL + /cache-stats command

* fix(api): normalize Kimi/DeepSeek/Gemini cache fields through shim layer

* test(api): cover /cache-stats rendering + fix CacheMetrics docstring drift

* fix(api): always reset cache turn counter + include date in /cache-stats rows

* refactor(api): unify shim usage builder + add cost-tracker wiring test

* fix(api): classify private-IP/self-hosted OpenAI endpoints as N/A instead of cold

* fix(api): require colon guard on IPv6 ULA prefix to avoid public-host over-match

* perf(api): ring buffer for cache history + hit rate clamp + .localhost TLD

* fix(api): null guards on formatters + document Codex Responses API shape

* fix(api): defensive start-of-turn reset + config gate fallback + env var docs

* fix(api): trust forwarded cache data on self-hosted URLs (data-driven)

* refactor(api): delegate streaming Responses usage to shared makeUsage helper
2026-04-25 12:38:25 +08:00
Zartris
f4ac709fa6 fix: report cache reads in streaming and correct cost calculation (#577)
* fix: report cache reads in streaming and correct cost calculation

Fix two bugs in how the OpenAI-to-Anthropic shim handles cached tokens:

1. codexShim: streaming message_delta missing cache_read_input_tokens
   The codexStreamToAnthropic() function builds the final message_delta
   usage object inline (not through makeUsage()), and only included
   input_tokens and output_tokens. cache_read_input_tokens was always 0,
   so /cost never showed cache reads for Responses API models (GPT-5+).

   Also fix makeUsage() to read input_tokens_details.cached_tokens and
   prompt_tokens_details.cached_tokens for the non-streaming path.

2. Both shims: cost double-counting from convention mismatch
   OpenAI includes cached tokens in input_tokens/prompt_tokens (i.e.,
   input_tokens = uncached + cached). Anthropic treats input_tokens as
   uncached only. The cost formula was:
     cost = input_tokens * inputRate + cache_read * cacheRate
   This double-counts cached tokens. Fix by subtracting cached from
   input during the conversion:
     input_tokens = prompt_tokens - cached_tokens

   In practice this was inflating reported costs by ~2x for sessions
   with high cache hit rates (which is most sessions, since Copilot
   auto-caches server-side).

Fixes #515

* fix: omit zero cache read/write fields from /cost output

Only show "cache read" and "cache write" in /cost per-model usage when
the value is > 0. Providers like GitHub Copilot never report
cache_creation_input_tokens (the server manages its own cache), so
showing "0 cache write" on every line is misleading — it implies caching
is not working when it actually is.

Before:
  claude-haiku:  2.6k input, 151 output, 39.8k cache read, 0 cache write ($0.04)

After:
  claude-haiku:  2.6k input, 151 output, 39.8k cache read ($0.04)

---------

Co-authored-by: Zartris <14197299+Zartris@users.noreply.github.com>
2026-04-10 23:40:42 +08:00
did:key:z6MkqDnb7Siv3Cwj7pGJq4T5EsUisECqR8KpnDLwcaZq5TPr
d2542c9a62 asdf
Squash the current repository state back into one baseline commit while
preserving the README reframing and repository contents.

Constraint: User explicitly requested a single squashed commit with subject "asdf"
Confidence: high
Scope-risk: broad
Reversibility: clean
Directive: This commit intentionally rewrites published history; coordinate before future force-pushes
Tested: git status clean; local history rewritten to one commit; force-pushed main to origin and instructkr
Not-tested: Fresh clone verification after push
2026-03-31 03:34:03 -07:00