fix: report cached tokens from OpenAI prompt_tokens_details
OpenAI returns cached token counts in usage.prompt_tokens_details.cached_tokens but the shim hardcoded cache_read_input_tokens to 0. This made prompt caching invisible to the cost tracker and session summary even when OpenAI's automatic caching was actively reducing costs. Changes: - Extend OpenAIStreamChunk usage interface with prompt_tokens_details - Map cached_tokens to cache_read_input_tokens in convertChunkUsage() - Same fix in _convertNonStreamingResponse() for non-streaming path - cache_creation_input_tokens remains 0 (OpenAI auto-caching has no creation cost — it is free and automatic)
This commit is contained in:
@@ -376,6 +376,9 @@ interface OpenAIStreamChunk {
|
||||
prompt_tokens?: number
|
||||
completion_tokens?: number
|
||||
total_tokens?: number
|
||||
prompt_tokens_details?: {
|
||||
cached_tokens?: number
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
@@ -392,7 +395,7 @@ function convertChunkUsage(
|
||||
input_tokens: usage.prompt_tokens ?? 0,
|
||||
output_tokens: usage.completion_tokens ?? 0,
|
||||
cache_creation_input_tokens: 0,
|
||||
cache_read_input_tokens: 0,
|
||||
cache_read_input_tokens: usage.prompt_tokens_details?.cached_tokens ?? 0,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -920,6 +923,9 @@ class OpenAIShimMessages {
|
||||
usage?: {
|
||||
prompt_tokens?: number
|
||||
completion_tokens?: number
|
||||
prompt_tokens_details?: {
|
||||
cached_tokens?: number
|
||||
}
|
||||
}
|
||||
},
|
||||
model: string,
|
||||
@@ -985,7 +991,7 @@ class OpenAIShimMessages {
|
||||
input_tokens: data.usage?.prompt_tokens ?? 0,
|
||||
output_tokens: data.usage?.completion_tokens ?? 0,
|
||||
cache_creation_input_tokens: 0,
|
||||
cache_read_input_tokens: 0,
|
||||
cache_read_input_tokens: data.usage?.prompt_tokens_details?.cached_tokens ?? 0,
|
||||
},
|
||||
}
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user