fix: output_schema + tools infinite loop on Gemini 2.x#5057
fix: output_schema + tools infinite loop on Gemini 2.x#5057vietnamesekid wants to merge 2 commits intogoogle:mainfrom
Conversation
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
Response from ADK Triaging Agent Hello @vietnamesekid, thank you for your contribution! Before we can review this PR, you'll need to sign the Contributor License Agreement (CLA). Please visit https://cla.developers.google.com/ to sign it. Thanks! |
00ea24a to
af6042a
Compare
can_use_output_schema_with_tools() was using is_gemini_2_or_above(), which incorrectly enabled native response_schema + tools for all Gemini 2.x models on Vertex AI. Gemini 2.0/2.5 can't actually handle both constraints at the same time — the model just keeps firing tool calls and never produces the structured response, resulting in an infinite loop. Per Google's docs, structured output combined with tools is only supported on Gemini 3 series models. This change adds is_gemini_3_or_above() and uses it so that Gemini 2.x falls back to the SetModelResponseTool workaround (which works fine).
af6042a to
914fbe1
Compare
|
Hi @vietnamesekid ,Thank you for your contribution! We appreciate you taking the time to submit this pull request. |
Thanks @rohityan! I've pushed fixes for both the mypy and formatting issues. Note that these were pre-existing in the codebase, not introduced by my PR, but happy to fix them here |
|
Thanks @nino-robotfutures-co for the model comparison data on #5054 and @surajksharma07 for tracing the issue to This comment was developed with AI assistance (Claude). ContextThis PR correctly fixes which models use @nino-robotfutures-co's testing on #5054 quantifies this:
Same ADK code, same The core insight: non-deterministic routing will always be probabilistically right... and probabilistically wrong. The fix needs a deterministic escape path, not just a better prompt. FindingsFour complementary changes that, together with this PR's model-detection fix, make the 1. The instruction is identical for all schema types — the tool signature is the real differentiator
instruction = (
'IMPORTANT: You have access to other tools, but you must provide '
'your final response using the set_model_response tool with the '
'required structured format. After using any other tools needed '
'to complete the task, always call set_model_response with your '
'final answer in the specified schema format.'
)What does differ by type is the
Flash models treat the trivial single-field signature as redundant ("I can just respond with text") rather than recognizing it as a required termination mechanism. The named fields in the BaseModel case act as a stronger signal. Proposed fix — type-aware instruction: # _output_schema_processor.py:56-64
from ...utils._schema_utils import is_basemodel_schema
if is_basemodel_schema(agent.output_schema):
instruction = (
'After completing any needed tool calls, provide your final '
'response by calling set_model_response with the required fields.'
)
else:
instruction = (
'CRITICAL: You MUST call the set_model_response tool exactly once '
'as your FINAL action. Do NOT respond with plain text. Do NOT call '
'any other tools after set_model_response. This is the ONLY way '
'to complete the task.'
)2. No safety net when the model ignores the instruction — a 2-layer deterministic guardThe stricter instruction (finding 1) reduces how often the model loops, but it's still probabilistic. A deterministic safety net complements it — two layers, both driven by a single round counter computed early in # _output_schema_processor.py — at the top of run_async(), after the early-return guard
_MAX_TOOL_ROUNDS = 25
tool_rounds = sum(
1 for e in invocation_context._get_events(
current_invocation=True, current_branch=True
)
if e.get_function_responses()
)Layer 2 (checked first): Hard cutoff on round N — last-resort termination if the forced # _output_schema_processor.py — immediately after computing tool_rounds
if tool_rounds >= _MAX_TOOL_ROUNDS:
logger.error(
'Tool execution reached %d rounds without producing structured '
'output via set_model_response. Breaking loop to prevent '
'runaway API costs.',
tool_rounds,
)
invocation_context.end_invocation = True
returnLayer 1: Force # _output_schema_processor.py — after appending tools/instruction
if tool_rounds >= _MAX_TOOL_ROUNDS - 1:
llm_request.config.tool_config = types.ToolConfig(
function_calling_config=types.FunctionCallingConfig(
mode=types.FunctionCallingConfigMode.ANY,
allowed_function_names=['set_model_response'],
)
)The type-aware instruction reduces how often the guard fires. The 3. Missing
|
| # | Change | Type | Scope |
|---|---|---|---|
| 1 | Type-aware instruction in _output_schema_processor.py |
Probabilistic improvement | Reduces loop frequency for primitive schemas (str, int) |
| 2 | tool_choice enforcement on round N-1 |
Deterministic guard | Forces valid structured output before cutoff |
| 3 | Hard cutoff on round N | Deterministic guard | Prevents runaway API costs in all cases |
| 4 | return after yield final_event in base_llm_flow.py:1093 |
Bug fix | Skips unnecessary transfer_to_agent processing after success |
Without findings 2-3, the path this PR activates for Gemini 2.x remains vulnerable to the same infinite loop on flash models — just through SetModelResponseTool instead of native response_schema. The model-detection fix and the deterministic guard are complementary; ideally both ship together.
|
Hope our discussion does not distract. Certainly. Thank you!! |
what's going on
Using
output_schematogether withtoolson Gemini 2.5 Flash / 2.5 Pro / 2.0 Flash via Vertex AI causes an infinite loop. The agent just keeps calling tools over and over and never actually returns the structured response.Ran into this while building a multi-agent setup where a sub-agent has both
output_schemaandtools, then gets wrapped as anAgentToolfor a parent agent. The sub-agent logs were just endless cycles of "Sending out request" / "Response received" every ~2 seconds.root cause
can_use_output_schema_with_tools()inoutput_schema_utils.pyusesis_gemini_2_or_above(), which returnsTruefor all Gemini 2.x. This tells ADK to setresponse_schemadirectly on the API request alongsidetools.Problem is Gemini 2.x doesn't actually support this combo. The model receives both constraints and can't reconcile them, so it just keeps generating tool calls without ever producing structured output. According to Google's structured output docs, this is only supported on Gemini 3 series (preview).
When
can_use_output_schema_with_tools()returnsFalse, ADK uses theSetModelResponseToolworkaround instead, which works perfectly model treats it as a regular function call to deliver the structured output.the fix
Added
is_gemini_3_or_above()tomodel_name_utils.pyand swapped it inoutput_schema_utils.py. Pretty straightforward:SetModelResponseToolworkaround (reliable, already battle tested)response_schema+tools(officially supported)No other callers of
is_gemini_2_or_aboveare affected those are for unrelated features (URL context, RAG retrieval, code executor).related issues
Likely fixes #5054, #4868, #4525, and the regression from #3413.
test plan
test_output_schema_utils.pyGemini 2.x on Vertex AI now correctly returnsFalseTrueTestIsGemini3OrAboveclass intest_model_name_utils.pytest_basic_processor.pyandtest_output_schema_processor.pyunaffected (they mock the function)