-
Notifications
You must be signed in to change notification settings - Fork 87
Description
https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/3-pro-image says that the input token window size for gemini-3-pro-image-preview is 65,536.
However, I'm getting conflicting information from the API itself, both in terms of the number of tokens that the given text costs, as well as the total supported input token size.
In my use case, I have user history information I'm providing as part of the prompt. I have to dynamically size that information down to fit the input context window to give the API the largest amount of the most recent data.
I'm using the SDK to determine how many tokens given text costs:
/**
* Returns the token count for the provided [prompt], using the provided [Client].
*/
suspend fun getTokenCount(client: Client, prompt: String): Int {
return withContext(Dispatchers.IO) {
val response = client.models.countTokens(
modelName,
prompt,
CountTokensConfig.builder().build()
)
response.totalTokens().get()
}
}I iteratively calls this and keep cutting down text until this method returns a value lower than the max context window size (I've currently hard-coded the max input token window size).
For example, here's a log from calling this function repeatedly, and using a max input token window size of 32768:
17:30:47.244 D Current token count: 75363, limit of 32768
17:30:47.587 D Current token count: 48495, limit of 32768
17:30:47.847 D Current token count: 38562, limit of 32768
17:30:48.037 D Current token count: 34849, limit of 32768
17:30:48.260 D Current token count: 33489, limit of 32768
17:30:48.477 D Current token count: 32994, limit of 32768
17:30:48.706 D Current token count: 32808, limit of 32768
17:30:48.956 D Current token count: 32746, limit of 32768
17:30:48.959 D History fits. Cut from 1222 to 530 lines of text
-
Problem 1 - I get
Failed to get chat stream: com.google.genai.errors.ClientException: 400 . The input token count (65,505) exceeds the maximum number of tokens allowed (32768).when using 65,536 as documented on the website. I expected the max input token size to be 65,536. -
Problem 2 - Even after I use a max of 32768 (as the error message says) and cut down to a value less than
32768(like32746above) based on the value returned fromcountTokens(), I still getFailed to get chat stream: com.google.genai.errors.ClientException: 400 . The input token count (33652) exceeds the maximum number of tokens allowed (32768).. I would have expected NOT to get a 400 exception ifcountTokens()gave me a count below the max input window size. So thecountTokens()function doesn't seem to be counting correctly.
Environment details
- Programming language: Java
- OS:
- Language runtime version:
- Package version:
Steps to reproduce
- Call
countTokens()with around 530 lines of text - Call
chatSession.sendMessageStream(prompt).await()with the prompt
What I see
- Max context input window size for
gemini-3-pro-image-previewis a lot lower than the value of 65,536 on the website - Even if
countTokens()says my prompt is under the 32768 limit, I still get a ClientException 400