Input token count for gemini-3-pro-image-preview is wrong

https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/3-pro-image says that the input token window size for `gemini-3-pro-image-preview` is 65,536.

However, I'm getting conflicting information from the API itself, both in terms of the number of tokens that the given text costs, as well as the total supported input token size.

In my use case, I have user history information I'm providing as part of the prompt. I have to dynamically size that information down to fit the input context window to give the API the largest amount of the most recent data.

I'm using the SDK to determine how many tokens given text costs:

```kotlin
  /**
   * Returns the token count for the provided [prompt], using the provided [Client].
   */
  suspend fun getTokenCount(client: Client, prompt: String): Int {
    return withContext(Dispatchers.IO) {
      val response = client.models.countTokens(
        modelName,
        prompt,
        CountTokensConfig.builder().build()
      )
      response.totalTokens().get()
    }
  }
```

I iteratively calls this and keep cutting down text until this method returns a value lower than the max context window size (I've currently hard-coded the max input token window size).

For example, here's a log from calling this function repeatedly, and using a max input token window size of `32768`:

```
17:30:47.244  D  Current token count: 75363, limit of 32768
17:30:47.587  D  Current token count: 48495, limit of 32768
17:30:47.847  D  Current token count: 38562, limit of 32768
17:30:48.037  D  Current token count: 34849, limit of 32768
17:30:48.260  D  Current token count: 33489, limit of 32768
17:30:48.477  D  Current token count: 32994, limit of 32768
17:30:48.706  D  Current token count: 32808, limit of 32768
17:30:48.956  D  Current token count: 32746, limit of 32768
17:30:48.959  D  History fits. Cut from 1222 to 530 lines of text
```

* **Problem 1** - I get `Failed to get chat stream: com.google.genai.errors.ClientException: 400 . The input token count (65,505) exceeds the maximum number of tokens allowed (32768).` when using 65,536 as documented on the website. I expected the max input token size to be 65,536.

* **Problem 2** - Even after I use a max of 32768 (as the error message says) and cut down to a value less than `32768` (like `32746` above) based on the value returned from `countTokens()`, I still get `Failed to get chat stream: com.google.genai.errors.ClientException: 400 . The input token count (33652) exceeds the maximum number of tokens allowed (32768).`. I would have expected NOT to get a 400 exception if `countTokens()` gave me a count below the max input window size. So the `countTokens()` function doesn't seem to be counting correctly.

#### Environment details

  - Programming language: Java
  - OS:
  - Language runtime version:
  - Package version:

#### Steps to reproduce

  1. Call `countTokens()` with around 530 lines of text
  2. Call `chatSession.sendMessageStream(prompt).await()` with the prompt

#### What I see

1. Max context input window size for `gemini-3-pro-image-preview` is a lot lower than the value of 65,536 on the website
2. Even if `countTokens()` says my prompt is under the 32768 limit, I still get a ClientException 400

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Input token count for gemini-3-pro-image-preview is wrong #713

Environment details

Steps to reproduce

What I see

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Input token count for gemini-3-pro-image-preview is wrong #713

Description

Environment details

Steps to reproduce

What I see

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions