feat: built-in large tool output handling by yunus25jmi1 · Pull Request #2728 · docker/docker-agent

yunus25jmi1 · 2026-05-09T12:54:32Z

Summary

Add a new builtin hook that automatically handles large tool responses. When enabled, tool responses exceeding the threshold are saved to disk and replaced with a pointer that the agent can read back using shell tools.

This is a root-cause fix for issue #2722.

Problem

When an MCP tool returns a large response the agent breaks in one of two ways:

With default max_old_tool_call_tokens budget: A single large result eats the entire budget and all earlier tool results in the session get silently replaced with [content truncated].
With max_old_tool_call_tokens: -1: The full response stays in the session but overflows the model's context window. Auto-compaction can't help because the oversized result is in the most recent messages, so the agent gets stuck.

Worth noting: builtin tools (shell, openapi, api) each have a hardcoded 30,000-char per-result limit. MCP tools have no per-result limit at all.

Root Cause

MCP tools have no output limit, unlike builtin tools (shell, openapi, api) which all have a hardcoded 30,000-char limit. When MCP returns large responses, they flow into the session unmodified.

Solution

Added a new builtin hook handle_large_tool_output that:

Intercepts tool responses at EventToolResponseTransform
When response exceeds threshold, saves full output to disk
Replaces response with a pointer showing tool name, char count, file path, preview

Configuration

agents:
  root:
    handle_large_tool_output:
      enabled: true        # default: false
      threshold: 5000      # chars - default
      output_dir: /tmp     # default: os.TempDir()
      preview_size: 3000   # chars in preview - default

Files Changed

File	Change
`agent-schema.json`	Added `handle_large_tool_output` config
`pkg/config/latest/types.go`	Added `HandleLargeToolOutputConfig` struct
`pkg/agent/agent.go`	Added field and accessor
`pkg/agent/opts.go`	Added `WithHandleLargeToolOutput` option
`pkg/teamloader/teamloader.go`	Wired config to agent
`pkg/hooks/builtins/handle_large_tool_output.go`	New builtin hook
`pkg/hooks/builtins/builtins.go`	Registered builtin
`pkg/runtime/hooks.go`	Passed config to ApplyAgentDefaults

Testing

Build: ✅ Passes
Lint: ✅ 0 issues
Docker Image: ✅ Builds

Fixes #2722

Add a new builtin hook that automatically handles large tool responses. When enabled, tool responses exceeding the threshold are saved to disk and replaced with a pointer that the agent can read back using shell tools. This prevents large MCP tool responses from exhausting the model's context window while still allowing the agent to access the full data. The feature is configured at the agent level: handle_large_tool_output: enabled: true threshold: 5000 # characters (default) output_dir: /tmp # default: os.TempDir() preview_size: 3000 # chars in preview (default) This is a root-cause fix for docker#2722 - it addresses both the lack of MCP output limits and context window overflow issues.

aheritier · 2026-05-11T09:30:50Z

Triage note: This is a community PR (contributor) implementing feature request #2722. The approach — a new handle_large_tool_output builtin hook at EventToolResponseTransform — is architecturally consistent with the existing redact_secrets pattern.

A few design questions for maintainer review before merging:

Opt-in vs. opt-out: The PR requires explicit YAML config. Should there be a sensible default threshold (e.g. the existing 30,000-char limit already on builtins)?
Config location: handle_large_tool_output is a top-level agent field vs. a hooks entry — breaks the hooks-as-the-extension-point pattern. Would hooks.tool_response_transform: [{type: builtin, command: handle_large_tool_output}] with agent-level options be more consistent?
Tests: The PR notes build/lint pass but no unit tests for the new builtin are mentioned — same bar as redact_secrets / unload would require test coverage.

Flagging help-wanted — good candidate for an iterative review with the contributor.

- Add unit tests for handle_large_tool_output builtin covering: - Response under threshold passes through unchanged - Response over threshold saves to disk with pointer returned - Custom output_dir and preview_size configuration - Fallback to os.TempDir() when output_dir not set - No-op on non tool_response_transform events - No-op on nil/empty/non-string tool responses - Error propagation on WriteFile failure - ApplyAgentDefaults injection behavior (enabled/disabled/nil) - Builtin registration check - Fix file permissions: 0o640 -> 0o600 (tool output may contain sensitive data)

Updates the default threshold from 5000 to 30000 to match the existing 30,000-char limit on builtin tools (shell, openapi, api). This ensures consistent behavior across all tool types when the feature is enabled.

yunus25jmi1 · 2026-05-11T18:15:59Z

Addressed all three triage concerns:

1. Threshold aligned to 30,000 — default threshold bumped from 5,000 to 30,000 to match the existing builtin tool limit (shell, openapi, api all cap at 30k chars). This is the sensible default you mentioned.

2. Config location — same pattern as redact_secrets — handle_large_tool_output is a top-level AgentConfig field, auto-injected into hooks.tool_response_transform via ApplyAgentDefaults. This mirrors redact_secrets exactly:

AgentConfig.RedactSecrets (bool) → auto-injects 3 hooks
AgentConfig.HandleLargeToolOutput (struct) → auto-injects 1 hook

Same abstraction, same mechanism. If the team wants to move to hooks.tool_response_transform: [{type: builtin, command: handle_large_tool_output}] with agent-level options, I'm happy to take that as a v2 refactor — it's a non-trivial change to the injection model and not a blocker for this PR.

3. Unit tests added — 13 tests covering:

Under/over threshold behavior
File write + pointer format
Custom output_dir and preview_size
Fallback to os.TempDir()
No-op cases (nil input, non-string response, empty response, wrong event)
Error propagation on write failure
Builtin registration
ApplyAgentDefaults injection behavior (enabled/disabled/nil)

Permissions hardened: 0o640 → 0o600 (tool output may contain sensitive data).

Sanitize SessionID and ToolUseID before constructing file paths to prevent path traversal attacks. An attacker controlling an MCP tool server could inject '../' sequences to write files outside the configured output directory. Fix: replace '..' with '__' and '/' with '_' in both values before path construction. Added tests for path traversal blocking and sanitizeFilename edge cases. Resolves: Path Traversal in Large Tool Output File Handling (High)

yunus25jmi1 · 2026-05-11T22:02:45Z

Security fix pushed — Path traversal vulnerability resolved.

Added sanitizeFilename() to strip .. and / from SessionID and ToolUseID before path construction. An attacker controlling an MCP tool server could have injected ../ sequences to write files outside the configured output directory.

func sanitizeFilename(name string) string {
    name = strings.ReplaceAll(name, "..", "__")
    name = strings.ReplaceAll(name, "/", "_")
    name = strings.ReplaceAll(name, "\\", "_")
    return name
}

Added two tests:

TestHandleLargeToolOutputPathTraversalIsBlocked — verifies malicious paths stay within output directory
TestSanitizeFilename — covers edge cases (slashes, backslashes, path traversal sequences)

This brings the PR to 4 commits covering functionality, tests, threshold alignment, and security.

The function replaces '..' first (becoming '__'), then '/' becomes '_', so '../../../etc/passwd' → '_________etc_passwd' (3 dots → 3 underscores, then 3 slashes → 3 underscores). Aligning test expectations with actual function behavior.

aheritier · 2026-05-12T21:22:29Z

/review

docker-agent

Assessment: 🔴 CRITICAL

1 confirmed high-severity bug (user config silently ignored) and 2 likely issues (filename collision, shell metacharacters in suggested command). See inline comments for details.

docker-agent · 2026-05-12T21:26:09Z

+	}, nil
+}
+
+type toolOutputConfig struct {


[HIGH] output_dir and preview_size config values are silently ignored due to JSON key mismatch

HandleLargeToolOutputConfig (in pkg/config/latest/types.go) uses snake_case JSON tags:

Threshold int `json:"threshold,omitempty"` OutputDir string `json:"output_dir,omitempty"` PreviewSize int `json:"preview_size,omitempty"`

But toolOutputConfig (used to unmarshal the config at runtime) has no JSON tags, so Go uses the PascalCase field names as keys ("Threshold", "OutputDir", "PreviewSize"):

type toolOutputConfig struct { Threshold int OutputDir string PreviewSize int }

In ApplyAgentDefaults, the HandleLargeToolOutputConfig is marshalled to JSON and passed as a string arg — producing e.g. {"threshold":5000,"output_dir":"/my/dir","preview_size":2000}. When parseArgs unmarshals this into toolOutputConfig:

"threshold" → matches Threshold ✅ (Go's case-insensitive match works here)

"output_dir" → does NOT match OutputDir ❌ (underscore vs no-underscore; case-insensitivity can't bridge this)

"preview_size" → does NOT match PreviewSize ❌ (same reason)

Result: Any output_dir or preview_size a user configures is silently dropped, and the hook always uses the hardcoded defaults (os.TempDir() and 3000 chars).

Fix: Add json struct tags to toolOutputConfig matching those on HandleLargeToolOutputConfig:

type toolOutputConfig struct { Threshold int `json:"threshold"` OutputDir string `json:"output_dir"` PreviewSize int `json:"preview_size"` }

docker-agent · 2026-05-12T21:26:09Z

+	}
+
+	filename := fmt.Sprintf("%s_%s.txt",
+		sanitizeFilename(in.SessionID),


[MEDIUM] Filename collision risk when SessionID or ToolUseID is empty

The output filename is built as:

filename := fmt.Sprintf("%s_%s.txt", sanitizeFilename(in.SessionID), sanitizeFilename(in.ToolUseID))

If either SessionID or ToolUseID is empty (e.g. sanitizeFilename("") == ""), multiple tool calls can produce the same filename (e.g., _.txt or _<toolUseID>.txt). Each subsequent os.WriteFile call silently overwrites the previous content. When the agent later reads the file, it gets the last response only — the earlier ones are lost without any error.

Fix: Validate that both IDs are non-empty before constructing the filename, or incorporate a timestamp/random component to guarantee uniqueness:

if in.SessionID == "" || in.ToolUseID == "" { return nil, fmt.Errorf("handle_large_tool_output: missing session or tool-use ID") }

docker-agent · 2026-05-12T21:26:09Z

+		preview = response[:previewSize]
+	}
+
+	pointer := fmt.Sprintf("[%s response: %d chars, full output saved to %s]\n\nFirst %d chars:\n%s\n\n[Use shell tool to read: cat %s]",


[LOW] Shell metacharacters in SessionID/ToolUseID flow into the suggested cat command

The pointer message shown to the agent includes:

pointer := fmt.Sprintf("...[Use shell tool to read: cat %s]", path)

sanitizeFilename only strips .., /, and \ — it does not strip spaces, $, backticks, ;, |, &, or ()). If a SessionID or ToolUseID contains these characters, the generated cat <path> string will embed them literally. When the agent's shell tool executes the suggested command, those metacharacters are interpreted by the shell, potentially causing unintended execution.

Fix: Either quote the path in the suggested command, or expand sanitizeFilename to allowlist only safe characters:

// Option A: shell-quote the path in the message pointer := fmt.Sprintf("...[Use shell tool to read: cat '%s']", path) // Option B: allowlist in sanitizeFilename func sanitizeFilename(name string) string { var b strings.Builder for _, r := range name { if (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z') || (r >= '0' && r <= '9') || r == '-' || r == '.' { b.WriteRune(r) } else { b.WriteRune('_') } } return b.String() }

yunus25jmi1 requested a review from a team as a code owner May 9, 2026 12:54

yunus25jmi1 force-pushed the feat/large-tool-output-handling branch from 19c03e4 to 1771f43 Compare May 9, 2026 12:57

yunus25jmi1 closed this May 9, 2026

yunus25jmi1 force-pushed the feat/large-tool-output-handling branch from 1771f43 to b657530 Compare May 9, 2026 13:00

yunus25jmi1 reopened this May 9, 2026

aheritier added area/tools For features/issues/fixes related to the usage of built-in and MCP tools priority:medium Normal priority, standard sprint work area/mcp MCP protocol, MCP tool servers, integration help-wanted Extra attention is needed, community contributions sought labels May 11, 2026

aheritier mentioned this pull request May 11, 2026

feat: built-in large tool output handling #2722

Open

yunus25jmi1 added 2 commits May 11, 2026 23:44

chore: align default threshold with builtin tool limit (30k)

1a44f61

Updates the default threshold from 5000 to 30000 to match the existing 30,000-char limit on builtin tools (shell, openapi, api). This ensures consistent behavior across all tool types when the feature is enabled.

aheritier added status/needs-design Requires architectural discussion or design review and removed priority:medium Normal priority, standard sprint work help-wanted Extra attention is needed, community contributions sought labels May 12, 2026

docker-agent reviewed May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: built-in large tool output handling#2728

feat: built-in large tool output handling#2728
yunus25jmi1 wants to merge 5 commits into
docker:mainfrom
yunus25jmi1:feat/large-tool-output-handling

yunus25jmi1 commented May 9, 2026 •

edited

Loading

Uh oh!

aheritier commented May 11, 2026

Uh oh!

yunus25jmi1 commented May 11, 2026

Uh oh!

yunus25jmi1 commented May 11, 2026

Uh oh!

aheritier commented May 12, 2026

Uh oh!

docker-agent left a comment

Uh oh!

docker-agent May 12, 2026

Uh oh!

docker-agent May 12, 2026

Uh oh!

docker-agent May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yunus25jmi1 commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Root Cause

Solution

Configuration

Files Changed

Testing

Uh oh!

aheritier commented May 11, 2026

Uh oh!

yunus25jmi1 commented May 11, 2026

Uh oh!

yunus25jmi1 commented May 11, 2026

Uh oh!

aheritier commented May 12, 2026

Uh oh!

docker-agent left a comment

Choose a reason for hiding this comment

Assessment: 🔴 CRITICAL

Uh oh!

docker-agent May 12, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent May 12, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yunus25jmi1 commented May 9, 2026 •

edited

Loading