feat: built-in large tool output handling#2728
Conversation
19c03e4 to
1771f43
Compare
1771f43 to
b657530
Compare
Add a new builtin hook that automatically handles large tool responses.
When enabled, tool responses exceeding the threshold are saved to disk
and replaced with a pointer that the agent can read back using shell
tools. This prevents large MCP tool responses from exhausting the
model's context window while still allowing the agent to access the
full data.
The feature is configured at the agent level:
handle_large_tool_output:
enabled: true
threshold: 5000 # characters (default)
output_dir: /tmp # default: os.TempDir()
preview_size: 3000 # chars in preview (default)
This is a root-cause fix for docker#2722 - it addresses both the lack of
MCP output limits and context window overflow issues.
|
Triage note: This is a community PR (contributor) implementing feature request #2722. The approach — a new A few design questions for maintainer review before merging:
Flagging |
- Add unit tests for handle_large_tool_output builtin covering: - Response under threshold passes through unchanged - Response over threshold saves to disk with pointer returned - Custom output_dir and preview_size configuration - Fallback to os.TempDir() when output_dir not set - No-op on non tool_response_transform events - No-op on nil/empty/non-string tool responses - Error propagation on WriteFile failure - ApplyAgentDefaults injection behavior (enabled/disabled/nil) - Builtin registration check - Fix file permissions: 0o640 -> 0o600 (tool output may contain sensitive data)
Updates the default threshold from 5000 to 30000 to match the existing 30,000-char limit on builtin tools (shell, openapi, api). This ensures consistent behavior across all tool types when the feature is enabled.
|
Addressed all three triage concerns: 1. Threshold aligned to 30,000 — default threshold bumped from 5,000 to 30,000 to match the existing builtin tool limit (shell, openapi, api all cap at 30k chars). This is the sensible default you mentioned. 2. Config location — same pattern as
Same abstraction, same mechanism. If the team wants to move to 3. Unit tests added — 13 tests covering:
Permissions hardened: |
Sanitize SessionID and ToolUseID before constructing file paths to prevent path traversal attacks. An attacker controlling an MCP tool server could inject '../' sequences to write files outside the configured output directory. Fix: replace '..' with '__' and '/' with '_' in both values before path construction. Added tests for path traversal blocking and sanitizeFilename edge cases. Resolves: Path Traversal in Large Tool Output File Handling (High)
|
Security fix pushed — Path traversal vulnerability resolved. Added func sanitizeFilename(name string) string {
name = strings.ReplaceAll(name, "..", "__")
name = strings.ReplaceAll(name, "/", "_")
name = strings.ReplaceAll(name, "\\", "_")
return name
}Added two tests:
This brings the PR to 4 commits covering functionality, tests, threshold alignment, and security. |
The function replaces '..' first (becoming '__'), then '/' becomes '_', so '../../../etc/passwd' → '_________etc_passwd' (3 dots → 3 underscores, then 3 slashes → 3 underscores). Aligning test expectations with actual function behavior.
|
/review |
docker-agent
left a comment
There was a problem hiding this comment.
Assessment: 🔴 CRITICAL
1 confirmed high-severity bug (user config silently ignored) and 2 likely issues (filename collision, shell metacharacters in suggested command). See inline comments for details.
| }, nil | ||
| } | ||
|
|
||
| type toolOutputConfig struct { |
There was a problem hiding this comment.
[HIGH] output_dir and preview_size config values are silently ignored due to JSON key mismatch
HandleLargeToolOutputConfig (in pkg/config/latest/types.go) uses snake_case JSON tags:
Threshold int `json:"threshold,omitempty"`
OutputDir string `json:"output_dir,omitempty"`
PreviewSize int `json:"preview_size,omitempty"`But toolOutputConfig (used to unmarshal the config at runtime) has no JSON tags, so Go uses the PascalCase field names as keys ("Threshold", "OutputDir", "PreviewSize"):
type toolOutputConfig struct {
Threshold int
OutputDir string
PreviewSize int
}In ApplyAgentDefaults, the HandleLargeToolOutputConfig is marshalled to JSON and passed as a string arg — producing e.g. {"threshold":5000,"output_dir":"/my/dir","preview_size":2000}. When parseArgs unmarshals this into toolOutputConfig:
"threshold"→ matchesThreshold✅ (Go's case-insensitive match works here)"output_dir"→ does NOT matchOutputDir❌ (underscore vs no-underscore; case-insensitivity can't bridge this)"preview_size"→ does NOT matchPreviewSize❌ (same reason)
Result: Any output_dir or preview_size a user configures is silently dropped, and the hook always uses the hardcoded defaults (os.TempDir() and 3000 chars).
Fix: Add json struct tags to toolOutputConfig matching those on HandleLargeToolOutputConfig:
type toolOutputConfig struct {
Threshold int `json:"threshold"`
OutputDir string `json:"output_dir"`
PreviewSize int `json:"preview_size"`
}| } | ||
|
|
||
| filename := fmt.Sprintf("%s_%s.txt", | ||
| sanitizeFilename(in.SessionID), |
There was a problem hiding this comment.
[MEDIUM] Filename collision risk when SessionID or ToolUseID is empty
The output filename is built as:
filename := fmt.Sprintf("%s_%s.txt",
sanitizeFilename(in.SessionID),
sanitizeFilename(in.ToolUseID))If either SessionID or ToolUseID is empty (e.g. sanitizeFilename("") == ""), multiple tool calls can produce the same filename (e.g., _.txt or _<toolUseID>.txt). Each subsequent os.WriteFile call silently overwrites the previous content. When the agent later reads the file, it gets the last response only — the earlier ones are lost without any error.
Fix: Validate that both IDs are non-empty before constructing the filename, or incorporate a timestamp/random component to guarantee uniqueness:
if in.SessionID == "" || in.ToolUseID == "" {
return nil, fmt.Errorf("handle_large_tool_output: missing session or tool-use ID")
}| preview = response[:previewSize] | ||
| } | ||
|
|
||
| pointer := fmt.Sprintf("[%s response: %d chars, full output saved to %s]\n\nFirst %d chars:\n%s\n\n[Use shell tool to read: cat %s]", |
There was a problem hiding this comment.
[LOW] Shell metacharacters in SessionID/ToolUseID flow into the suggested cat command
The pointer message shown to the agent includes:
pointer := fmt.Sprintf("...[Use shell tool to read: cat %s]", path)sanitizeFilename only strips .., /, and \ — it does not strip spaces, $, backticks, ;, |, &, or ()). If a SessionID or ToolUseID contains these characters, the generated cat <path> string will embed them literally. When the agent's shell tool executes the suggested command, those metacharacters are interpreted by the shell, potentially causing unintended execution.
Fix: Either quote the path in the suggested command, or expand sanitizeFilename to allowlist only safe characters:
// Option A: shell-quote the path in the message
pointer := fmt.Sprintf("...[Use shell tool to read: cat '%s']", path)
// Option B: allowlist in sanitizeFilename
func sanitizeFilename(name string) string {
var b strings.Builder
for _, r := range name {
if (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z') ||
(r >= '0' && r <= '9') || r == '-' || r == '.' {
b.WriteRune(r)
} else {
b.WriteRune('_')
}
}
return b.String()
}
Summary
Add a new builtin hook that automatically handles large tool responses. When enabled, tool responses exceeding the threshold are saved to disk and replaced with a pointer that the agent can read back using shell tools.
This is a root-cause fix for issue #2722.
Problem
When an MCP tool returns a large response the agent breaks in one of two ways:
With default
max_old_tool_call_tokensbudget: A single large result eats the entire budget and all earlier tool results in the session get silently replaced with[content truncated].With
max_old_tool_call_tokens: -1: The full response stays in the session but overflows the model's context window. Auto-compaction can't help because the oversized result is in the most recent messages, so the agent gets stuck.Worth noting: builtin tools (shell, openapi, api) each have a hardcoded 30,000-char per-result limit. MCP tools have no per-result limit at all.
Root Cause
MCP tools have no output limit, unlike builtin tools (shell, openapi, api) which all have a hardcoded 30,000-char limit. When MCP returns large responses, they flow into the session unmodified.
Solution
Added a new builtin hook
handle_large_tool_outputthat:EventToolResponseTransformConfiguration
Files Changed
agent-schema.jsonhandle_large_tool_outputconfigpkg/config/latest/types.goHandleLargeToolOutputConfigstructpkg/agent/agent.gopkg/agent/opts.goWithHandleLargeToolOutputoptionpkg/teamloader/teamloader.gopkg/hooks/builtins/handle_large_tool_output.gopkg/hooks/builtins/builtins.gopkg/runtime/hooks.goTesting
Fixes #2722