fix: output_schema + tools infinite loop on Gemini 2.x by vietnamesekid · Pull Request #5057 · google/adk-python

vietnamesekid · 2026-03-30T08:23:25Z

what's going on

Using output_schema together with tools on Gemini 2.5 Flash / 2.5 Pro / 2.0 Flash via Vertex AI causes an infinite loop. The agent just keeps calling tools over and over and never actually returns the structured response.

Ran into this while building a multi-agent setup where a sub-agent has both output_schema and tools, then gets wrapped as an AgentTool for a parent agent. The sub-agent logs were just endless cycles of "Sending out request" / "Response received" every ~2 seconds.

root cause

can_use_output_schema_with_tools() in output_schema_utils.py uses is_gemini_2_or_above(), which returns True for all Gemini 2.x. This tells ADK to set response_schema directly on the API request alongside tools.

Problem is Gemini 2.x doesn't actually support this combo. The model receives both constraints and can't reconcile them, so it just keeps generating tool calls without ever producing structured output. According to Google's structured output docs, this is only supported on Gemini 3 series (preview).

When can_use_output_schema_with_tools() returns False, ADK uses the SetModelResponseTool workaround instead, which works perfectly model treats it as a regular function call to deliver the structured output.

the fix

Added is_gemini_3_or_above() to model_name_utils.py and swapped it in output_schema_utils.py. Pretty straightforward:

Gemini 2.x → uses SetModelResponseTool workaround (reliable, already battle tested)
Gemini 3.x → uses native response_schema + tools (officially supported)

No other callers of is_gemini_2_or_above are affected those are for unrelated features (URL context, RAG retrieval, code executor).

related issues

Likely fixes #5054, #4868, #4525, and the regression from #3413.

test plan

All existing tests pass (59/59)
Updated test_output_schema_utils.py Gemini 2.x on Vertex AI now correctly returns False
Added Gemini 3.x test cases that return True
Added TestIsGemini3OrAbove class in test_model_name_utils.py
Downstream tests in test_basic_processor.py and test_output_schema_processor.py unaffected (they mock the function)

google-cla · 2026-03-30T08:23:41Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

adk-bot · 2026-03-30T08:24:25Z

Response from ADK Triaging Agent

Hello @vietnamesekid, thank you for your contribution!

Before we can review this PR, you'll need to sign the Contributor License Agreement (CLA). Please visit https://cla.developers.google.com/ to sign it.

Thanks!

can_use_output_schema_with_tools() was using is_gemini_2_or_above(), which incorrectly enabled native response_schema + tools for all Gemini 2.x models on Vertex AI. Gemini 2.0/2.5 can't actually handle both constraints at the same time — the model just keeps firing tool calls and never produces the structured response, resulting in an infinite loop. Per Google's docs, structured output combined with tools is only supported on Gemini 3 series models. This change adds is_gemini_3_or_above() and uses it so that Gemini 2.x falls back to the SetModelResponseTool workaround (which works fine).

rohityan · 2026-03-30T22:29:25Z

Hi @vietnamesekid ,Thank you for your contribution! We appreciate you taking the time to submit this pull request.
Can you please fix the failing mypy-diff tests and also formatting errors. You can use autoformat.sh to fix formatting errors.

vietnamesekid · 2026-03-31T00:19:16Z

Hi @vietnamesekid ,Thank you for your contribution! We appreciate you taking the time to submit this pull request. Can you please fix the failing mypy-diff tests and also formatting errors. You can use autoformat.sh to fix formatting errors.

Thanks @rohityan! I've pushed fixes for both the mypy and formatting issues. Note that these were pre-existing in the codebase, not introduced by my PR, but happy to fix them here

surfai · 2026-04-01T02:22:29Z

Thanks @nino-robotfutures-co for the model comparison data on #5054 and @surajksharma07 for tracing the issue to _OutputSchemaRequestProcessor and pointing us in the right direction. Dropping findings here as requested.

This comment was developed with AI assistance (Claude).

Context

This PR correctly fixes which models use SetModelResponseTool vs native response_schema + tools. That's necessary — but the underlying problem is broader: even when SetModelResponseTool is active, flash models can still ignore it and loop indefinitely.

@nino-robotfutures-co's testing on #5054 quantifies this:

Model	output_schema + tools	Tool loop?	Structured output produced?
gemini-2.5-flash	HalOutput (BaseModel)	Yes — loops on `check_backend_health` ~15x, eventually breaks out	Yes, after loop
gemini-3-flash-preview	HalOutput (BaseModel)	Yes — 47 model calls, never terminates within 300s	No
gemini-3.1-pro-preview	HalOutput (BaseModel)	No — 1 model call	Yes, first attempt

Same ADK code, same SetModelResponseTool, different models, different outcomes. The loop termination is non-deterministic — it depends entirely on whether the LLM chooses to call set_model_response vs other tools on any given turn. A prompt instruction improves the probability but can never guarantee it. Passing 10 times doesn't prove it won't fail on attempt 11.

The core insight: non-deterministic routing will always be probabilistically right... and probabilistically wrong. The fix needs a deterministic escape path, not just a better prompt.

Findings

Four complementary changes that, together with this PR's model-detection fix, make the output_schema + tools path robust across all models.

1. The instruction is identical for all schema types — the tool signature is the real differentiator

_output_schema_processor.py:57-63 injects the same instruction regardless of schema type:

instruction = (
    'IMPORTANT: You have access to other tools, but you must provide '
    'your final response using the set_model_response tool with the '
    'required structured format. After using any other tools needed '
    'to complete the task, always call set_model_response with your '
    'final answer in the specified schema format.'
)

What does differ by type is the set_model_response tool signature the LLM sees (from set_model_response_tool.py:69-99):

BaseModel → set_model_response(*, summary: str, score: int) — distinct named fields
str → set_model_response(*, response: str) — single generic field

Flash models treat the trivial single-field signature as redundant ("I can just respond with text") rather than recognizing it as a required termination mechanism. The named fields in the BaseModel case act as a stronger signal.

Proposed fix — type-aware instruction:

# _output_schema_processor.py:56-64
from ...utils._schema_utils import is_basemodel_schema

if is_basemodel_schema(agent.output_schema):
    instruction = (
        'After completing any needed tool calls, provide your final '
        'response by calling set_model_response with the required fields.'
    )
else:
    instruction = (
        'CRITICAL: You MUST call the set_model_response tool exactly once '
        'as your FINAL action. Do NOT respond with plain text. Do NOT call '
        'any other tools after set_model_response. This is the ONLY way '
        'to complete the task.'
    )

2. No safety net when the model ignores the instruction — a 2-layer deterministic guard

The stricter instruction (finding 1) reduces how often the model loops, but it's still probabilistic. A deterministic safety net complements it — two layers, both driven by a single round counter computed early in run_async():

# _output_schema_processor.py — at the top of run_async(), after the early-return guard
_MAX_TOOL_ROUNDS = 25

tool_rounds = sum(
    1 for e in invocation_context._get_events(
        current_invocation=True, current_branch=True
    )
    if e.get_function_responses()
)

Layer 2 (checked first): Hard cutoff on round N — last-resort termination if the forced tool_choice still doesn't produce output:

# _output_schema_processor.py — immediately after computing tool_rounds
if tool_rounds >= _MAX_TOOL_ROUNDS:
    logger.error(
        'Tool execution reached %d rounds without producing structured '
        'output via set_model_response. Breaking loop to prevent '
        'runaway API costs.',
        tool_rounds,
    )
    invocation_context.end_invocation = True
    return

Layer 1: Force tool_choice on round N-1 — constrain the LLM to only set_model_response, guaranteeing a valid structured response:

# _output_schema_processor.py — after appending tools/instruction
if tool_rounds >= _MAX_TOOL_ROUNDS - 1:
    llm_request.config.tool_config = types.ToolConfig(
        function_calling_config=types.FunctionCallingConfig(
            mode=types.FunctionCallingConfigMode.ANY,
            allowed_function_names=['set_model_response'],
        )
    )

The type-aware instruction reduces how often the guard fires. The tool_choice enforcement ensures valid output when the model won't comply. The hard cutoff prevents runaway costs in all cases. All three layers together make the termination deterministic rather than prompt-probabilistic.

3. Missing `return` after successful `set_model_response`

Separate from the infinite loop — when set_model_response IS successfully called in base_llm_flow.py:1083-1093, the final_event is yielded but execution falls through to transfer_to_agent (line 1094) and any subsequent processing:

# base_llm_flow.py:1083-1093 — current code
      if json_response := _output_schema_processor.get_structured_model_response(
          function_response_event
      ):
        final_event = (
            _output_schema_processor.create_final_model_response_event(
                invocation_context, json_response
            )
        )
        yield final_event
      transfer_to_agent = function_response_event.actions.transfer_to_agent  # falls through

Proposed fix:

        yield final_event
        return  # Structured output produced — terminate processing

4. Two architectural patterns that avoid this class of problem entirely

For context on why the deterministic guard works — two patterns that reframe the problem:

Pattern A: Two-phase execution. Phase 1: model calls tools freely (no output schema). Phase 2: model is re-invoked with full history but only the schema tool, no other tools. The model can't confuse "call a tool" with "produce the response" because those actions never compete in the same LLM call.

Pattern B: Always-forced tool selection. Register the schema as a tool alongside real tools, but set tool_choice=any so the model must pick a tool every turn. It either calls a real tool (keep working) or calls the schema tool (finish). The model can't silently ignore the schema tool because it's forced to pick something.

The tool_choice guard in finding 2 is a lightweight version of Pattern B, applied as a safety net on round N-1 rather than as the primary mechanism.

Summary for reviewer

This PR's model-detection fix is necessary and correct — Gemini 2.x should not use native response_schema + tools. But the SetModelResponseTool fallback path it activates has its own failure mode: flash models can still ignore set_model_response and loop indefinitely.

Findings 1-3 harden that fallback path with a layered defense:

#	Change	Type	Scope
1	Type-aware instruction in `_output_schema_processor.py`	Probabilistic improvement	Reduces loop frequency for primitive schemas (`str`, `int`)
2	`tool_choice` enforcement on round N-1	Deterministic guard	Forces valid structured output before cutoff
3	Hard cutoff on round N	Deterministic guard	Prevents runaway API costs in all cases
4	`return` after `yield final_event` in `base_llm_flow.py:1093`	Bug fix	Skips unnecessary `transfer_to_agent` processing after success

Without findings 2-3, the path this PR activates for Gemini 2.x remains vulnerable to the same infinite loop on flash models — just through SetModelResponseTool instead of native response_schema. The model-detection fix and the deterministic guard are complementary; ideally both ship together.

vietnamesekid · 2026-04-01T02:41:29Z

Thanks @surfai for the thorough analysis. I agree findings 1-3 are valuable improvements. I'd prefer to keep this PR focused on the model detection fix and handle those as a separate PR. That said, if you and @rohityan prefer, I'm happy to include them here. What do you think?

surfai · 2026-04-01T03:50:20Z

Hope our discussion does not distract. Certainly. Thank you!!

vietnamesekid · 2026-04-01T04:14:41Z

I've opened #5091 as a follow-up to implement the findings from @surfai's analysis. It adds type-aware instructions, a deterministic tool_choice guard, a hard cutoff, and the early return fix. All tests passing locally (unit + integration on both GOOGLE_AI and Vertex AI).

adk-bot added the core [Component] This issue is related to the core interface and implementation label Mar 30, 2026

vietnamesekid force-pushed the fix/output-schema-tools-infinite-loop branch from 00ea24a to af6042a Compare March 30, 2026 08:40

vietnamesekid force-pushed the fix/output-schema-tools-infinite-loop branch from af6042a to 914fbe1 Compare March 30, 2026 08:41

rohityan self-assigned this Mar 30, 2026

rohityan added the request clarification [Status] The maintainer need clarification or more information from the author label Mar 30, 2026

fix: resolve mypy no-any-return and pyink formatting errors

ae6ab4d

surajksharma07 mentioned this pull request Mar 31, 2026

[BUG]: output_schema=str combined with tools causes infinite call_llm/execute_tools loop #5054

Open

vietnamesekid mentioned this pull request Apr 1, 2026

fix: harden SetModelResponseTool fallback to prevent infinite loops #5091

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: output_schema + tools infinite loop on Gemini 2.x#5057

fix: output_schema + tools infinite loop on Gemini 2.x#5057
vietnamesekid wants to merge 2 commits intogoogle:mainfrom
vietnamesekid:fix/output-schema-tools-infinite-loop

vietnamesekid commented Mar 30, 2026 •

edited

Loading

Uh oh!

google-cla bot commented Mar 30, 2026

Uh oh!

adk-bot commented Mar 30, 2026

Uh oh!

rohityan commented Mar 30, 2026

Uh oh!

vietnamesekid commented Mar 31, 2026

Uh oh!

surfai commented Apr 1, 2026

Uh oh!

vietnamesekid commented Apr 1, 2026 •

edited

Loading

Uh oh!

surfai commented Apr 1, 2026

Uh oh!

vietnamesekid commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

vietnamesekid commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

what's going on

root cause

the fix

related issues

test plan

Uh oh!

google-cla bot commented Mar 30, 2026

Uh oh!

adk-bot commented Mar 30, 2026

Uh oh!

rohityan commented Mar 30, 2026

Uh oh!

vietnamesekid commented Mar 31, 2026

Uh oh!

surfai commented Apr 1, 2026

Context

Findings

1. The instruction is identical for all schema types — the tool signature is the real differentiator

2. No safety net when the model ignores the instruction — a 2-layer deterministic guard

3. Missing return after successful set_model_response

4. Two architectural patterns that avoid this class of problem entirely

Summary for reviewer

Uh oh!

vietnamesekid commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

surfai commented Apr 1, 2026

Uh oh!

vietnamesekid commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

vietnamesekid commented Mar 30, 2026 •

edited

Loading

3. Missing `return` after successful `set_model_response`

vietnamesekid commented Apr 1, 2026 •

edited

Loading