Skip to content

fix: output_schema + tools infinite loop on Gemini 2.x#5057

Open
vietnamesekid wants to merge 2 commits intogoogle:mainfrom
vietnamesekid:fix/output-schema-tools-infinite-loop
Open

fix: output_schema + tools infinite loop on Gemini 2.x#5057
vietnamesekid wants to merge 2 commits intogoogle:mainfrom
vietnamesekid:fix/output-schema-tools-infinite-loop

Conversation

@vietnamesekid
Copy link
Copy Markdown

@vietnamesekid vietnamesekid commented Mar 30, 2026

what's going on

Using output_schema together with tools on Gemini 2.5 Flash / 2.5 Pro / 2.0 Flash via Vertex AI causes an infinite loop. The agent just keeps calling tools over and over and never actually returns the structured response.

Ran into this while building a multi-agent setup where a sub-agent has both output_schema and tools, then gets wrapped as an AgentTool for a parent agent. The sub-agent logs were just endless cycles of "Sending out request" / "Response received" every ~2 seconds.

root cause

can_use_output_schema_with_tools() in output_schema_utils.py uses is_gemini_2_or_above(), which returns True for all Gemini 2.x. This tells ADK to set response_schema directly on the API request alongside tools.

Problem is Gemini 2.x doesn't actually support this combo. The model receives both constraints and can't reconcile them, so it just keeps generating tool calls without ever producing structured output. According to Google's structured output docs, this is only supported on Gemini 3 series (preview).

When can_use_output_schema_with_tools() returns False, ADK uses the SetModelResponseTool workaround instead, which works perfectly model treats it as a regular function call to deliver the structured output.

the fix

Added is_gemini_3_or_above() to model_name_utils.py and swapped it in output_schema_utils.py. Pretty straightforward:

  • Gemini 2.x → uses SetModelResponseTool workaround (reliable, already battle tested)
  • Gemini 3.x → uses native response_schema + tools (officially supported)

No other callers of is_gemini_2_or_above are affected those are for unrelated features (URL context, RAG retrieval, code executor).

related issues

Likely fixes #5054, #4868, #4525, and the regression from #3413.

test plan

  • All existing tests pass (59/59)
  • Updated test_output_schema_utils.py Gemini 2.x on Vertex AI now correctly returns False
  • Added Gemini 3.x test cases that return True
  • Added TestIsGemini3OrAbove class in test_model_name_utils.py
  • Downstream tests in test_basic_processor.py and test_output_schema_processor.py unaffected (they mock the function)

@google-cla
Copy link
Copy Markdown

google-cla bot commented Mar 30, 2026

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@adk-bot adk-bot added the core [Component] This issue is related to the core interface and implementation label Mar 30, 2026
@adk-bot
Copy link
Copy Markdown
Collaborator

adk-bot commented Mar 30, 2026

Response from ADK Triaging Agent

Hello @vietnamesekid, thank you for your contribution!

Before we can review this PR, you'll need to sign the Contributor License Agreement (CLA). Please visit https://cla.developers.google.com/ to sign it.

Thanks!

@vietnamesekid vietnamesekid force-pushed the fix/output-schema-tools-infinite-loop branch from 00ea24a to af6042a Compare March 30, 2026 08:40
can_use_output_schema_with_tools() was using is_gemini_2_or_above(),
which incorrectly enabled native response_schema + tools for all
Gemini 2.x models on Vertex AI. Gemini 2.0/2.5 can't actually handle
both constraints at the same time — the model just keeps firing tool
calls and never produces the structured response, resulting in an
infinite loop.

Per Google's docs, structured output combined with tools is only
supported on Gemini 3 series models. This change adds
is_gemini_3_or_above() and uses it so that Gemini 2.x falls back
to the SetModelResponseTool workaround (which works fine).
@vietnamesekid vietnamesekid force-pushed the fix/output-schema-tools-infinite-loop branch from af6042a to 914fbe1 Compare March 30, 2026 08:41
@rohityan rohityan self-assigned this Mar 30, 2026
@rohityan
Copy link
Copy Markdown
Collaborator

Hi @vietnamesekid ,Thank you for your contribution! We appreciate you taking the time to submit this pull request.
Can you please fix the failing mypy-diff tests and also formatting errors. You can use autoformat.sh to fix formatting errors.

@rohityan rohityan added the request clarification [Status] The maintainer need clarification or more information from the author label Mar 30, 2026
@vietnamesekid
Copy link
Copy Markdown
Author

Hi @vietnamesekid ,Thank you for your contribution! We appreciate you taking the time to submit this pull request. Can you please fix the failing mypy-diff tests and also formatting errors. You can use autoformat.sh to fix formatting errors.

Thanks @rohityan! I've pushed fixes for both the mypy and formatting issues. Note that these were pre-existing in the codebase, not introduced by my PR, but happy to fix them here

@surfai
Copy link
Copy Markdown

surfai commented Apr 1, 2026

Thanks @nino-robotfutures-co for the model comparison data on #5054 and @surajksharma07 for tracing the issue to _OutputSchemaRequestProcessor and pointing us in the right direction. Dropping findings here as requested.

This comment was developed with AI assistance (Claude).


Context

This PR correctly fixes which models use SetModelResponseTool vs native response_schema + tools. That's necessary — but the underlying problem is broader: even when SetModelResponseTool is active, flash models can still ignore it and loop indefinitely.

@nino-robotfutures-co's testing on #5054 quantifies this:

Model output_schema + tools Tool loop? Structured output produced?
gemini-2.5-flash HalOutput (BaseModel) Yes — loops on check_backend_health ~15x, eventually breaks out Yes, after loop
gemini-3-flash-preview HalOutput (BaseModel) Yes — 47 model calls, never terminates within 300s No
gemini-3.1-pro-preview HalOutput (BaseModel) No — 1 model call Yes, first attempt

Same ADK code, same SetModelResponseTool, different models, different outcomes. The loop termination is non-deterministic — it depends entirely on whether the LLM chooses to call set_model_response vs other tools on any given turn. A prompt instruction improves the probability but can never guarantee it. Passing 10 times doesn't prove it won't fail on attempt 11.

The core insight: non-deterministic routing will always be probabilistically right... and probabilistically wrong. The fix needs a deterministic escape path, not just a better prompt.


Findings

Four complementary changes that, together with this PR's model-detection fix, make the output_schema + tools path robust across all models.

1. The instruction is identical for all schema types — the tool signature is the real differentiator

_output_schema_processor.py:57-63 injects the same instruction regardless of schema type:

instruction = (
    'IMPORTANT: You have access to other tools, but you must provide '
    'your final response using the set_model_response tool with the '
    'required structured format. After using any other tools needed '
    'to complete the task, always call set_model_response with your '
    'final answer in the specified schema format.'
)

What does differ by type is the set_model_response tool signature the LLM sees (from set_model_response_tool.py:69-99):

  • BaseModelset_model_response(*, summary: str, score: int) — distinct named fields
  • strset_model_response(*, response: str) — single generic field

Flash models treat the trivial single-field signature as redundant ("I can just respond with text") rather than recognizing it as a required termination mechanism. The named fields in the BaseModel case act as a stronger signal.

Proposed fix — type-aware instruction:

# _output_schema_processor.py:56-64
from ...utils._schema_utils import is_basemodel_schema

if is_basemodel_schema(agent.output_schema):
    instruction = (
        'After completing any needed tool calls, provide your final '
        'response by calling set_model_response with the required fields.'
    )
else:
    instruction = (
        'CRITICAL: You MUST call the set_model_response tool exactly once '
        'as your FINAL action. Do NOT respond with plain text. Do NOT call '
        'any other tools after set_model_response. This is the ONLY way '
        'to complete the task.'
    )

2. No safety net when the model ignores the instruction — a 2-layer deterministic guard

The stricter instruction (finding 1) reduces how often the model loops, but it's still probabilistic. A deterministic safety net complements it — two layers, both driven by a single round counter computed early in run_async():

# _output_schema_processor.py — at the top of run_async(), after the early-return guard
_MAX_TOOL_ROUNDS = 25

tool_rounds = sum(
    1 for e in invocation_context._get_events(
        current_invocation=True, current_branch=True
    )
    if e.get_function_responses()
)

Layer 2 (checked first): Hard cutoff on round N — last-resort termination if the forced tool_choice still doesn't produce output:

# _output_schema_processor.py — immediately after computing tool_rounds
if tool_rounds >= _MAX_TOOL_ROUNDS:
    logger.error(
        'Tool execution reached %d rounds without producing structured '
        'output via set_model_response. Breaking loop to prevent '
        'runaway API costs.',
        tool_rounds,
    )
    invocation_context.end_invocation = True
    return

Layer 1: Force tool_choice on round N-1 — constrain the LLM to only set_model_response, guaranteeing a valid structured response:

# _output_schema_processor.py — after appending tools/instruction
if tool_rounds >= _MAX_TOOL_ROUNDS - 1:
    llm_request.config.tool_config = types.ToolConfig(
        function_calling_config=types.FunctionCallingConfig(
            mode=types.FunctionCallingConfigMode.ANY,
            allowed_function_names=['set_model_response'],
        )
    )

The type-aware instruction reduces how often the guard fires. The tool_choice enforcement ensures valid output when the model won't comply. The hard cutoff prevents runaway costs in all cases. All three layers together make the termination deterministic rather than prompt-probabilistic.

3. Missing return after successful set_model_response

Separate from the infinite loop — when set_model_response IS successfully called in base_llm_flow.py:1083-1093, the final_event is yielded but execution falls through to transfer_to_agent (line 1094) and any subsequent processing:

# base_llm_flow.py:1083-1093 — current code
      if json_response := _output_schema_processor.get_structured_model_response(
          function_response_event
      ):
        final_event = (
            _output_schema_processor.create_final_model_response_event(
                invocation_context, json_response
            )
        )
        yield final_event
      transfer_to_agent = function_response_event.actions.transfer_to_agent  # falls through

Proposed fix:

        yield final_event
        return  # Structured output produced — terminate processing

4. Two architectural patterns that avoid this class of problem entirely

For context on why the deterministic guard works — two patterns that reframe the problem:

Pattern A: Two-phase execution. Phase 1: model calls tools freely (no output schema). Phase 2: model is re-invoked with full history but only the schema tool, no other tools. The model can't confuse "call a tool" with "produce the response" because those actions never compete in the same LLM call.

Pattern B: Always-forced tool selection. Register the schema as a tool alongside real tools, but set tool_choice=any so the model must pick a tool every turn. It either calls a real tool (keep working) or calls the schema tool (finish). The model can't silently ignore the schema tool because it's forced to pick something.

The tool_choice guard in finding 2 is a lightweight version of Pattern B, applied as a safety net on round N-1 rather than as the primary mechanism.


Summary for reviewer

This PR's model-detection fix is necessary and correct — Gemini 2.x should not use native response_schema + tools. But the SetModelResponseTool fallback path it activates has its own failure mode: flash models can still ignore set_model_response and loop indefinitely.

Findings 1-3 harden that fallback path with a layered defense:

# Change Type Scope
1 Type-aware instruction in _output_schema_processor.py Probabilistic improvement Reduces loop frequency for primitive schemas (str, int)
2 tool_choice enforcement on round N-1 Deterministic guard Forces valid structured output before cutoff
3 Hard cutoff on round N Deterministic guard Prevents runaway API costs in all cases
4 return after yield final_event in base_llm_flow.py:1093 Bug fix Skips unnecessary transfer_to_agent processing after success

Without findings 2-3, the path this PR activates for Gemini 2.x remains vulnerable to the same infinite loop on flash models — just through SetModelResponseTool instead of native response_schema. The model-detection fix and the deterministic guard are complementary; ideally both ship together.

@vietnamesekid
Copy link
Copy Markdown
Author

vietnamesekid commented Apr 1, 2026

Thanks @surfai for the thorough analysis. I agree findings 1-3 are valuable improvements. I'd prefer to keep this PR focused on the model detection fix and handle those as a separate PR. That said, if you and @rohityan prefer, I'm happy to include them here. What do you think?

@surfai
Copy link
Copy Markdown

surfai commented Apr 1, 2026

Hope our discussion does not distract. Certainly. Thank you!!

@vietnamesekid
Copy link
Copy Markdown
Author

I've opened #5091 as a follow-up to implement the findings from @surfai's analysis. It adds type-aware instructions, a deterministic tool_choice guard, a hard cutoff, and the early return fix. All tests passing locally (unit + integration on both GOOGLE_AI and Vertex AI).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core [Component] This issue is related to the core interface and implementation request clarification [Status] The maintainer need clarification or more information from the author

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: output_schema=str combined with tools causes infinite call_llm/execute_tools loop

4 participants