Skip to content

Gemini fails when a tool returns a top-level JSON array, and the same result can keep breaking later turns via tape replay #109

@WZhongyun

Description

@WZhongyun

Title

Gemini fails when a tool returns a top-level JSON array, and the same result can keep breaking later turns via tape replay

Body

## What happened

When using a Gemini model, if a tool returns valid JSON whose top-level value is an array, for example `[]`, Bub fails with:

```text
invalid_input: gemini:... 1 validation error for FunctionResponse
response
  Input should be a valid dictionary [type=dict_type, input_value=[], input_type=list]

How we hit it

In our case:

  1. Bub received a message from a channel
  2. Bub called a script through bash
  3. The script successfully returned []
  4. The same turn then failed with the Gemini FunctionResponse.response validation error

So the script/tool execution itself succeeded. The failure happened after that result was fed back into the model flow.

Tape evidence

We checked the local tape and found the corresponding entries:

  • the user message asking for the task list
  • the bash tool call
  • the tool result recorded as:
"payload": {"results": ["[]"]}
  • then the Gemini validation error on the same turn

This confirms that the failure is triggered by a successful tool result whose content is a top-level JSON array.

Why it kept happening later

After that result was written into tape, later turns in the same session could fail again, even for unrelated messages such as “hello”.

From our investigation, this happened because the historical tool_result was replayed from tape during later turns, and the same kind of invalid payload re-entered the model flow.

What we changed locally

We made two local changes:

  1. We changed the relevant skill scripts so they no longer emit raw top-level array/scalar JSON to Bub.
    Instead, they emit labeled text such as:

    JSON list response (0 items):
    []
    
  2. We also made tape replay render old non-object tool_result values as safe text, so existing tape data would not keep reproducing the same failure.

After restarting Bub, the same session started working normally again.

Result after local mitigation

After the local changes:

  • Bub was able to answer the “empty task list” case correctly
  • later outbound delivery succeeded as well
  • the same tape no longer reproduced the error in our environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions