Is there an existing issue for this?
How do you use Sentry?
Sentry Saas (sentry.io)
Which SDK are you using?
@sentry/node
SDK Version
10.49.0 (reproduced through 10.57.0 and current develop)
Framework Version
ai@6.0.184 (Vercel AI SDK 6)
Link to Sentry event
(org data; can share privately if needed)
Reproduction Example/SDK Setup
With AI SDK 6, every provider normalizes usage.inputTokens to be cache-inclusive — total = noCache + cacheRead + cacheWrite:
@ai-sdk/anthropic: total: inputTokens + cacheCreationTokens + cacheReadTokens
@ai-sdk/amazon-bedrock: total: inputTokens + cacheReadTokens + cacheWriteTokens
@ai-sdk/google: total: promptTokenCount (Google's promptTokenCount already includes cachedContentTokenCount)
The SDK's telemetry then emits both ai.usage.inputTokens (= cache-inclusive total; it even sets gen_ai.usage.input_tokens itself per OTel GenAI semconv) and ai.usage.cachedInputTokens (= cacheRead).
processVercelAiSpanAttributes (packages/core/src/tracing/vercel-ai/index.ts, the block commented // Input tokens is the sum of prompt tokens and cached input tokens) renames both and then does:
attributes[GEN_AI_USAGE_INPUT_TOKENS_ATTRIBUTE] =
attributes[GEN_AI_USAGE_INPUT_TOKENS_ATTRIBUTE] + attributes[GEN_AI_USAGE_INPUT_TOKENS_CACHED_ATTRIBUTE];
That heuristic matches AI SDK 5 Anthropic semantics (where promptTokens excluded cache), but with SDK 6 it double-counts cache reads on every generation span.
Minimal repro against the real processor:
import { addVercelAiProcessors } from '@sentry/core';
let processor;
addVercelAiProcessors({ on: () => () => {}, addEventProcessor: p => (processor = p) });
const event = processor({
type: 'transaction',
contexts: { trace: {} },
spans: [{
span_id: 'aaaaaaaaaaaaaaaa',
origin: 'auto.vercelai.otel',
op: 'gen_ai.generate_content',
data: {
'ai.operationId': 'ai.streamText.doStream',
'operation.name': 'ai.streamText.doStream',
'ai.usage.inputTokens': 9500, // = 1000 noCache + 8000 cacheRead + 500 cacheWrite (SDK 6 total)
'ai.usage.outputTokens': 300,
'ai.usage.cachedInputTokens': 8000,
'ai.usage.inputTokenDetails.noCacheTokens': 1000,
'ai.usage.inputTokenDetails.cacheReadTokens': 8000,
'ai.usage.inputTokenDetails.cacheWriteTokens': 500,
},
}],
});
console.log(event.spans[0].data['gen_ai.usage.input_tokens']);
// actual: 17500 — expected: 9500
Steps to Reproduce
- Use
ai@6.x with any provider that reports prompt-cache usage (Anthropic / Bedrock / Google) and experimental_telemetry.isEnabled: true
- Enable
vercelAIIntegration in @sentry/node
- Make a call that gets cache hits
- Inspect
gen_ai.usage.input_tokens on the gen_ai.generate_content span
Expected Result
gen_ai.usage.input_tokens equals the SDK-reported inputTokens total (9500 above). For AI SDK 6 spans the summation should be skipped — ai.usage.inputTokenDetails.noCacheTokens being present is a reliable v6 marker, or gate on ai.operationId which is v6+.
Actual Result
gen_ai.usage.input_tokens = total + cacheRead = noCache + cacheWrite + 2×cacheRead (Anthropic/Bedrock) or promptTokenCount + cachedContent (Google). gen_ai.usage.total_tokens and the accumulated invoke_agent totals inherit the inflation, as do AI-cost views derived from these attributes. On agent traffic with high cache-hit rates (~85% of input cached) input tokens are overstated ~1.85×, which is how we noticed — Sentry token dashboards diverged ~2× from AWS Bedrock / Google Vertex billing consoles.
Workaround we ship: a beforeSendTransaction that recomputes input_tokens/total_tokens from ai.usage.inputTokenDetails.*.
Is there an existing issue for this?
How do you use Sentry?
Sentry Saas (sentry.io)
Which SDK are you using?
@sentry/node
SDK Version
10.49.0 (reproduced through 10.57.0 and current
develop)Framework Version
ai@6.0.184 (Vercel AI SDK 6)
Link to Sentry event
(org data; can share privately if needed)
Reproduction Example/SDK Setup
With AI SDK 6, every provider normalizes
usage.inputTokensto be cache-inclusive —total = noCache + cacheRead + cacheWrite:@ai-sdk/anthropic:total: inputTokens + cacheCreationTokens + cacheReadTokens@ai-sdk/amazon-bedrock:total: inputTokens + cacheReadTokens + cacheWriteTokens@ai-sdk/google:total: promptTokenCount(Google'spromptTokenCountalready includescachedContentTokenCount)The SDK's telemetry then emits both
ai.usage.inputTokens(= cache-inclusive total; it even setsgen_ai.usage.input_tokensitself per OTel GenAI semconv) andai.usage.cachedInputTokens(= cacheRead).processVercelAiSpanAttributes(packages/core/src/tracing/vercel-ai/index.ts, the block commented// Input tokens is the sum of prompt tokens and cached input tokens) renames both and then does:That heuristic matches AI SDK 5 Anthropic semantics (where
promptTokensexcluded cache), but with SDK 6 it double-counts cache reads on every generation span.Minimal repro against the real processor:
Steps to Reproduce
ai@6.xwith any provider that reports prompt-cache usage (Anthropic / Bedrock / Google) andexperimental_telemetry.isEnabled: truevercelAIIntegrationin@sentry/nodegen_ai.usage.input_tokenson thegen_ai.generate_contentspanExpected Result
gen_ai.usage.input_tokensequals the SDK-reportedinputTokenstotal (9500 above). For AI SDK 6 spans the summation should be skipped —ai.usage.inputTokenDetails.noCacheTokensbeing present is a reliable v6 marker, or gate onai.operationIdwhich is v6+.Actual Result
gen_ai.usage.input_tokens= total + cacheRead =noCache + cacheWrite + 2×cacheRead(Anthropic/Bedrock) orpromptTokenCount + cachedContent(Google).gen_ai.usage.total_tokensand the accumulatedinvoke_agenttotals inherit the inflation, as do AI-cost views derived from these attributes. On agent traffic with high cache-hit rates (~85% of input cached) input tokens are overstated ~1.85×, which is how we noticed — Sentry token dashboards diverged ~2× from AWS Bedrock / Google Vertex billing consoles.Workaround we ship: a
beforeSendTransactionthat recomputesinput_tokens/total_tokensfromai.usage.inputTokenDetails.*.