vercelAIIntegration double-counts cached input tokens with AI SDK 6 (gen_ai.usage.input_tokens inflated by cache reads)

### Is there an existing issue for this?

- [x] I have checked for existing issues https://github.com/getsentry/sentry-javascript/issues
- [x] I have reviewed the documentation https://docs.sentry.io/
- [x] I am using the latest SDK release https://github.com/getsentry/sentry-javascript/releases

### How do you use Sentry?

Sentry Saas (sentry.io)

### Which SDK are you using?

@sentry/node

### SDK Version

10.49.0 (reproduced through 10.57.0 and current `develop`)

### Framework Version

ai@6.0.184 (Vercel AI SDK 6)

### Link to Sentry event

_(org data; can share privately if needed)_

### Reproduction Example/SDK Setup

With AI SDK **6**, every provider normalizes `usage.inputTokens` to be **cache-inclusive** — `total = noCache + cacheRead + cacheWrite`:

- `@ai-sdk/anthropic`: `total: inputTokens + cacheCreationTokens + cacheReadTokens`
- `@ai-sdk/amazon-bedrock`: `total: inputTokens + cacheReadTokens + cacheWriteTokens`
- `@ai-sdk/google`: `total: promptTokenCount` (Google's `promptTokenCount` already includes `cachedContentTokenCount`)

The SDK's telemetry then emits **both** `ai.usage.inputTokens` (= cache-inclusive total; it even sets `gen_ai.usage.input_tokens` itself per OTel GenAI semconv) **and** `ai.usage.cachedInputTokens` (= cacheRead).

`processVercelAiSpanAttributes` (`packages/core/src/tracing/vercel-ai/index.ts`, the block commented `// Input tokens is the sum of prompt tokens and cached input tokens`) renames both and then does:

```ts
attributes[GEN_AI_USAGE_INPUT_TOKENS_ATTRIBUTE] =
  attributes[GEN_AI_USAGE_INPUT_TOKENS_ATTRIBUTE] + attributes[GEN_AI_USAGE_INPUT_TOKENS_CACHED_ATTRIBUTE];
```

That heuristic matches AI SDK 5 Anthropic semantics (where `promptTokens` excluded cache), but with SDK 6 it **double-counts cache reads** on every generation span.

Minimal repro against the real processor:

```ts
import { addVercelAiProcessors } from '@sentry/core';

let processor;
addVercelAiProcessors({ on: () => () => {}, addEventProcessor: p => (processor = p) });

const event = processor({
  type: 'transaction',
  contexts: { trace: {} },
  spans: [{
    span_id: 'aaaaaaaaaaaaaaaa',
    origin: 'auto.vercelai.otel',
    op: 'gen_ai.generate_content',
    data: {
      'ai.operationId': 'ai.streamText.doStream',
      'operation.name': 'ai.streamText.doStream',
      'ai.usage.inputTokens': 9500,            // = 1000 noCache + 8000 cacheRead + 500 cacheWrite (SDK 6 total)
      'ai.usage.outputTokens': 300,
      'ai.usage.cachedInputTokens': 8000,
      'ai.usage.inputTokenDetails.noCacheTokens': 1000,
      'ai.usage.inputTokenDetails.cacheReadTokens': 8000,
      'ai.usage.inputTokenDetails.cacheWriteTokens': 500,
    },
  }],
});

console.log(event.spans[0].data['gen_ai.usage.input_tokens']);
// actual: 17500  — expected: 9500
```

### Steps to Reproduce

1. Use `ai@6.x` with any provider that reports prompt-cache usage (Anthropic / Bedrock / Google) and `experimental_telemetry.isEnabled: true`
2. Enable `vercelAIIntegration` in `@sentry/node`
3. Make a call that gets cache hits
4. Inspect `gen_ai.usage.input_tokens` on the `gen_ai.generate_content` span

### Expected Result

`gen_ai.usage.input_tokens` equals the SDK-reported `inputTokens` total (9500 above). For AI SDK 6 spans the summation should be skipped — `ai.usage.inputTokenDetails.noCacheTokens` being present is a reliable v6 marker, or gate on `ai.operationId` which is v6+.

### Actual Result

`gen_ai.usage.input_tokens` = total + cacheRead = `noCache + cacheWrite + 2×cacheRead` (Anthropic/Bedrock) or `promptTokenCount + cachedContent` (Google). `gen_ai.usage.total_tokens` and the accumulated `invoke_agent` totals inherit the inflation, as do AI-cost views derived from these attributes. On agent traffic with high cache-hit rates (~85% of input cached) input tokens are overstated ~1.85×, which is how we noticed — Sentry token dashboards diverged ~2× from AWS Bedrock / Google Vertex billing consoles.

Workaround we ship: a `beforeSendTransaction` that recomputes `input_tokens`/`total_tokens` from `ai.usage.inputTokenDetails.*`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vercelAIIntegration double-counts cached input tokens with AI SDK 6 (gen_ai.usage.input_tokens inflated by cache reads) #21484

Is there an existing issue for this?

How do you use Sentry?

Which SDK are you using?

SDK Version

Framework Version

Link to Sentry event

Reproduction Example/SDK Setup

Steps to Reproduce

Expected Result

Actual Result

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Uh oh!

vercelAIIntegration double-counts cached input tokens with AI SDK 6 (gen_ai.usage.input_tokens inflated by cache reads) #21484

Description

Is there an existing issue for this?

How do you use Sentry?

Which SDK are you using?

SDK Version

Framework Version

Link to Sentry event

Reproduction Example/SDK Setup

Steps to Reproduce

Expected Result

Actual Result

Metadata

Metadata

Assignees

Labels

Fields

Projects

Milestone

Relationships

Development

Issue actions