Skip to content

[BUG] ConcurrentToolExecutor collects tool_results in completion order, breaking prompt-cache stability #2112

@charles-dyfis-net

Description

@charles-dyfis-net

Checks

  • I have updated to the lastest minor and patch version of Strands
  • I have checked the documentation and this is not expected behavior
  • I have searched ./issues and there are no duplicates of my issue

Strands Version

1.35.0

Python Version

3.13.9

Operating System

macOS 26.4

Installation Method

other

Steps to Reproduce

import asyncio
import json
from collections.abc import AsyncGenerator
from typing import Any, override

from strands import Agent, tool
from strands.models import Model
from strands.types.content import Messages
from strands.types.streaming import StreamEvent
from strands.types.tools import ToolSpec


class TwoToolModel(Model):
    """Model that emits two parallel tool_use blocks on the first turn, then ends."""

    def __init__(self) -> None:
        self.turn = 0

    @override
    def update_config(self, **model_config: Any) -> None:
        pass

    @override
    def get_config(self) -> Any:
        return {}

    @override
    def structured_output(
        self, output_model: Any, prompt: Messages, system_prompt: str | None = None, **kwargs: Any
    ) -> AsyncGenerator[Any, None]:
        raise NotImplementedError

    @override
    async def stream(
        self,
        messages: Messages,
        tool_specs: list[ToolSpec] | None = None,
        system_prompt: str | None = None,
        **kwargs: Any,
    ) -> AsyncGenerator[StreamEvent, None]:
        self.turn += 1
        yield StreamEvent(messageStart={"role": "assistant"})
        if self.turn == 1:
            for tid, name in [("id-slow", "slow_tool"), ("id-fast", "fast_tool")]:
                yield StreamEvent(
                    contentBlockStart={"start": {"toolUse": {"name": name, "toolUseId": tid}}},
                )
                yield StreamEvent(contentBlockDelta={"delta": {"toolUse": {"input": json.dumps({})}}})
                yield StreamEvent(contentBlockStop={})
            yield StreamEvent(messageStop={"stopReason": "tool_use"})
        else:
            yield StreamEvent(contentBlockStart={"contentBlockIndex": 0, "start": {}})
            yield StreamEvent(contentBlockDelta={"contentBlockIndex": 0, "delta": {"text": "done"}})
            yield StreamEvent(contentBlockStop={"contentBlockIndex": 0})
            yield StreamEvent(messageStop={"stopReason": "end_turn"})


@tool(name="slow_tool", description="sleeps briefly and returns")
async def slow_tool() -> str:
    await asyncio.sleep(0.05)
    return "slow done"


@tool(name="fast_tool", description="returns immediately")
async def fast_tool() -> str:
    return "fast done"


async def main() -> None:
    agent = Agent(model=TwoToolModel(), tools=[slow_tool, fast_tool])
    _ = await agent.invoke_async("call both tools")

    # Find the user message with the tool_result blocks
    tool_result_message = next(
        m for m in agent.messages
        if m.get("role") == "user" and any("toolResult" in b for b in m.get("content", []))
    )
    ids = [b["toolResult"]["toolUseId"] for b in tool_result_message["content"] if "toolResult" in b]
    print(f"tool_result order in next-turn prompt: {ids}")
    # Expected: ['id-slow', 'id-fast']  — matches the assistant's toolUse emission order
    # Actual:   ['id-fast', 'id-slow']  — fast_tool finished first, so it was appended first


asyncio.run(main())

Expected Behavior

The tool_result blocks in the follow-up user message should appear in the same order as the toolUse blocks in the preceding assistant message. That order is deterministic (it comes from the model's output) and stable across runs, which is a prerequisite for byte-stable prompts.

Actual Behavior

tool_result blocks appear in tool-completion order. With the reproducer above this is deterministically inverted (fast_tool finishes before slow_tool), but in general the ordering is scheduler-dependent and varies run to run when the tools have similar completion times.

Additional Context

Byte-stable prompts are a load-bearing assumption for:

  • Anthropic's server-side prompt caching — cache entries are keyed on the exact prompt prefix. A reordering of tool_result blocks in a turn invalidates every cache entry that would otherwise have been reused for the rest of the conversation.
  • Client-side request/response caching — any workflow that hashes prompts to deduplicate LLM calls (replay caches used by CI, offline test runs, determinism harnesses) will miss on every run, because the scheduler coin-flip picks a different ordering.
  • Reproducible agent trajectories — when cached replays fall through to live LLM calls, the new responses differ, and the agent's decision path forks. We hit this in a test suite where a single concurrent tool_use at turn 10 caused two subsets of otherwise-identical tests to end up on entirely different agent trajectories (16 vs 18 turns, different tool sequences, different final verdicts).

In our case this manifested as "two back-to-back runs of the same test suite, with no code changes, produced different prompt hashes and a new live LLM request against what was supposed to be a fully-cached offline run."


The bug is in ConcurrentToolExecutor (src/strands/tools/executors/concurrent.py) combined with ToolExecutor._stream_with_trace (src/strands/tools/executors/_executor.py).

ConcurrentToolExecutor._execute launches one asyncio.Task per tool_use, passing the same shared tool_results: list[ToolResult] to every task:

for task_id, tool_use in enumerate(tool_uses):
    tasks.append(
        asyncio.create_task(
            self._task(
                agent,
                tool_use,
                tool_results,  # ← shared list
                ...
            )
        )
    )

Each task's _stream_with_trace appends to that shared list when its tool finishes:

yield ToolResultEvent(after_event.result)
tool_results.append(after_event.result)  # ← append order = scheduler completion order
return

Then event_loop.py serializes the list in whatever order the scheduler left it:

# src/strands/event_loop/event_loop.py
tool_result_message: Message = {
    "role": "user",
    "content": [{"toolResult": result} for result in tool_results],
}

SequentialToolExecutor does not have this problem — it iterates tool_uses in request order and each tool appends to tool_results serially, producing request-order output.

Possible Solution

No response

Related Issues

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions