fix: harden SetModelResponseTool fallback to prevent infinite loops#5091
Open
vietnamesekid wants to merge 1 commit intogoogle:mainfrom
Open
fix: harden SetModelResponseTool fallback to prevent infinite loops#5091vietnamesekid wants to merge 1 commit intogoogle:mainfrom
vietnamesekid wants to merge 1 commit intogoogle:mainfrom
Conversation
Flash models (gemini-2.5-flash, gemini-3-flash) can ignore set_model_response and loop indefinitely when output_schema is used with tools. This adds a layered defense: 1. Type-aware instruction: primitive schemas (str, int) get a stronger prompt since their trivial tool signature is easily ignored by flash models. 2. Deterministic tool_choice guard: on round N-1 (_MAX_TOOL_ROUNDS-1), restrict the model to only call set_model_response via tool_config. 3. Hard cutoff: on round N, terminate the invocation entirely to prevent runaway API costs. 4. Early return after set_model_response: skip unnecessary transfer_to_agent processing in base_llm_flow.py after structured output is successfully produced. Based on analysis by @surfai, @nino-robotfutures-co, and @surajksharma07 on google#5054.
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This is a follow-up to #5057. It improves the fallback behavior of
SetModelResponseToolto avoid infinite loops, especially when flash models (like gemini-2.5-flash and gemini-3-flash) ignoreset_model_responseand keep calling other tools.The changes come from the investigation and discussion in #5054.
Changes
_output_schema_processor.py): primitive schemas likestrandintare easy for models to ignore, so we now give clearer guidance in these cases.set_model_responseusingtool_config, so we can guarantee structured output._MAX_TOOL_ROUNDS=25), we stop execution completely to avoid runaway loops and unnecessary API usage.base_llm_flow.py): onceset_model_responsesucceeds, we skip extra steps liketransfer_to_agent.Test plan
strschemas across GOOGLE_AI and Vertex AIRelated
set_model_responseerror responses are treated as final output, bypassing ReflectAndRetryToolPlugin retry #4525