Skip to content

Python: Gemini connector drops cached-content and thinking token counts from usage details #6637

@he-yufeng

Description

@he-yufeng

Describe the bug

The Gemini chat client only surfaces input, output, and total token counts in usage_details. Gemini's GenerateContentResponseUsageMetadata also reports cached_content_token_count (tokens served from context cache) and thoughts_token_count (tokens spent on thinking by reasoning models), but _parse_usage drops both. So for cached prompts and thinking models, cache and reasoning usage silently read as zero, which throws off cost and token accounting.

UsageDetails already has canonical fields for these (cache_read_input_token_count, reasoning_output_token_count), and the OpenAI and Anthropic connectors already populate them — Gemini is the odd one out.

Where

python/packages/gemini/agent_framework_gemini/_chat_client.py, RawGeminiChatClient._parse_usage.

Expected behavior

When the API returns cached_content_token_count / thoughts_token_count, map them to cache_read_input_token_count / reasoning_output_token_count in usage_details, matching the OpenAI and Anthropic connectors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    pythonIssues related to the Python codebasetriagePlaced on an issue or discussion that requires a maintainer to triage the item

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions