Skip to content

Input token count for gemini-3-pro-image-preview is wrong #713

@barbeau

Description

@barbeau

https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/3-pro-image says that the input token window size for gemini-3-pro-image-preview is 65,536.

However, I'm getting conflicting information from the API itself, both in terms of the number of tokens that the given text costs, as well as the total supported input token size.

In my use case, I have user history information I'm providing as part of the prompt. I have to dynamically size that information down to fit the input context window to give the API the largest amount of the most recent data.

I'm using the SDK to determine how many tokens given text costs:

  /**
   * Returns the token count for the provided [prompt], using the provided [Client].
   */
  suspend fun getTokenCount(client: Client, prompt: String): Int {
    return withContext(Dispatchers.IO) {
      val response = client.models.countTokens(
        modelName,
        prompt,
        CountTokensConfig.builder().build()
      )
      response.totalTokens().get()
    }
  }

I iteratively calls this and keep cutting down text until this method returns a value lower than the max context window size (I've currently hard-coded the max input token window size).

For example, here's a log from calling this function repeatedly, and using a max input token window size of 32768:

17:30:47.244  D  Current token count: 75363, limit of 32768
17:30:47.587  D  Current token count: 48495, limit of 32768
17:30:47.847  D  Current token count: 38562, limit of 32768
17:30:48.037  D  Current token count: 34849, limit of 32768
17:30:48.260  D  Current token count: 33489, limit of 32768
17:30:48.477  D  Current token count: 32994, limit of 32768
17:30:48.706  D  Current token count: 32808, limit of 32768
17:30:48.956  D  Current token count: 32746, limit of 32768
17:30:48.959  D  History fits. Cut from 1222 to 530 lines of text
  • Problem 1 - I get Failed to get chat stream: com.google.genai.errors.ClientException: 400 . The input token count (65,505) exceeds the maximum number of tokens allowed (32768). when using 65,536 as documented on the website. I expected the max input token size to be 65,536.

  • Problem 2 - Even after I use a max of 32768 (as the error message says) and cut down to a value less than 32768 (like 32746 above) based on the value returned from countTokens(), I still get Failed to get chat stream: com.google.genai.errors.ClientException: 400 . The input token count (33652) exceeds the maximum number of tokens allowed (32768).. I would have expected NOT to get a 400 exception if countTokens() gave me a count below the max input window size. So the countTokens() function doesn't seem to be counting correctly.

Environment details

  • Programming language: Java
  • OS:
  • Language runtime version:
  • Package version:

Steps to reproduce

  1. Call countTokens() with around 530 lines of text
  2. Call chatSession.sendMessageStream(prompt).await() with the prompt

What I see

  1. Max context input window size for gemini-3-pro-image-preview is a lot lower than the value of 65,536 on the website
  2. Even if countTokens() says my prompt is under the 32768 limit, I still get a ClientException 400

Metadata

Metadata

Assignees

Labels

priority: p2Moderately-important priority. Fix may not be included in next release.status:awaiting user responsetype: bugError or flaw in code with unintended results or allowing sub-optimal usage patterns.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions