feat: add fuzzy text matching fallbacks for edit_file tool by trungutt · Pull Request #2145 · docker/docker-agent

trungutt · 2026-03-17T11:26:39Z

Summary

Inspired by the resilient edit matching in OpenCode (which uses 8+ fallback strategies in its edit tool), this adds fuzzy text matching to edit_file.

When an exact oldText match fails, the tool now tries progressively looser normalization strategies before returning an error:

Line-trimmed — strip trailing whitespace per line
Indentation-flexible — strip leading whitespace per line
Line-continuation normalization — collapse \ + newline + spaces into a single space
Escape-normalized — treat \" and " as equivalent
Whitespace-collapsed — collapse all whitespace runs to a single space

These handle real-world LLM mistakes observed in session logs:

Adding spurious indentation to oldText (e.g., 4 leading spaces on a Dockerfile RUN line)
Collapsing multi-line shell continuations (\ + newline) into single lines
Unescaping \" to " when reproducing file content

When a fuzzy strategy matches, the replacement is applied to the original (un-normalized) content so surrounding text is preserved exactly. Index mapping between normalized and original strings uses binary search with rune-aware boundary snapping to preserve UTF-8 integrity.

Both the builtin and ACP filesystem handlers use the shared FindAndReplace function.

This PR is independent of #2144 (double-serialized edits fix).

Inspired by the resilient edit matching in OpenCode, add fuzzy text matching to edit_file. When an exact oldText match fails, try progressively looser normalization strategies before giving up: 1. Line-trimmed (strip trailing whitespace per line) 2. Indentation-flexible (strip leading whitespace per line) 3. Line-continuation normalization (collapse \ + newline + spaces) 4. Escape-normalized (\" ↔ ") 5. Whitespace-collapsed (collapse runs of whitespace to single space) These handle common LLM mistakes such as adding spurious indentation, collapsing shell line continuations into single lines, or unescaping quotes. The replacement is always applied to the original content so surrounding text is preserved. Index mapping between normalized and original strings uses binary search with rune-aware boundary snapping to preserve UTF-8 integrity. Both the builtin and ACP filesystem handlers now use the shared FindAndReplace function.

docker-agent

Review Summary

Assessment: 🔴 CRITICAL

This PR adds fuzzy text matching fallbacks to the edit_file tool with 6 normalization strategies. The implementation has critical bugs in the boundary mapping logic that can cause data corruption when fuzzy matching is triggered.

Critical Issues Found

2 HIGH severity bugs in the mapNormToOrigEnd function that handles boundary detection for character-consuming normalizations (whitespace collapsing, line continuations, etc.). These bugs can lead to incorrect replacement ranges and file corruption.

1 MEDIUM severity issue in the binary search logic that may cause off-by-one errors with collapsing normalizations.

Recommendation

The boundary mapping algorithm needs to be redesigned or thoroughly tested with edge cases involving:

Multi-byte UTF-8 characters combined with character-consuming normalizations
Whitespace collapsing at UTF-8 boundaries
Line continuation normalization with non-ASCII content

The existing test suite does not cover these critical edge cases.

Detailed Findings

See inline comments below.

docker-agent · 2026-03-17T11:35:32Z

pkg/tools/builtin/filesystem.go

+	pos := mapNormToOrig(original, normIdx, normalize)
+	for pos < len(original) {
+		_, size := utf8.DecodeRuneInString(original[pos:])
+		if len(normalize(original[:pos+size])) != len(normalize(original[:pos])) {


🔴 HIGH: Incorrect termination condition for consumed character detection

The condition len(normalize(original[:pos+size])) != len(normalize(original[:pos])) is fundamentally flawed for character-consuming normalizations.

The Problem:
When normalization collapses multiple characters to fewer (e.g., collapseWhitespace, collapseLineContinuations), advancing by one rune will almost always change the normalized length UNLESS that rune is completely consumed. This logic will fail to correctly identify the end boundary.

Impact:

Wrong replacement ranges

Potential file corruption when fuzzy matching triggers

Especially dangerous with multi-byte UTF-8 characters

Example:
Consider normalizing "a b" (two spaces) to "a b" (one space):

At position after 'a', normalized length is 2

Advancing one rune (first space), normalized length is still 2 (space collapsed)

Advancing another rune (second space), normalized length is still 2

The loop will incorrectly continue past the intended boundary

The test suite doesn't cover these edge cases with UTF-8 and character-consuming normalizations.

docker-agent · 2026-03-17T11:35:33Z

pkg/tools/builtin/filesystem.go

+// quote in \" → "). This ensures the entire original text corresponding
+// to the normalized match is included. Advances by whole runes to
+// preserve UTF-8 integrity.
+func mapNormToOrigEnd(original string, normIdx int, normalize func(string) string) int {


🔴 HIGH: mapNormToOrigEnd boundary detection logic is incorrect

This function attempts to advance past trailing characters consumed by normalization, but the implementation is fundamentally incorrect for normalizations that map multiple characters to fewer.

Root Cause:
The function uses mapNormToOrig (binary search) to find the starting position, then advances by runes while checking if the normalized length changes. However, this approach doesn't correctly handle character-consuming normalizations like:

collapseWhitespace (multiple spaces → one space)

collapseLineContinuations (\\ + newline → space)

normalizeEscapedQuotes (\\" → ")

Impact:
Incorrect end boundary detection leads to wrong replacement ranges and potentially corrupted file content.

Recommendation:
Consider a different approach:

Track character-to-character mapping during normalization

Use a forward scan with explicit state tracking

Add comprehensive tests with UTF-8 edge cases

docker-agent · 2026-03-17T11:35:33Z

pkg/tools/builtin/filesystem.go

+
+// mapNormToOrig finds the smallest rune-aligned position in the original
+// string whose normalized prefix has at least normIdx characters.
+func mapNormToOrig(original string, normIdx int, normalize func(string) string) int {


🟡 MEDIUM: Binary search may overshoot with collapsing normalizations

The binary search finds "the smallest position where the normalized prefix has at least normIdx characters", but this can cause off-by-one errors when normalization collapses characters.

Example:
Normalizing "a b" (two spaces) to "a b" (one space):

Looking for position 2 in normalized string (after "a ")

Binary search might land at position 3 in original (after "a ")

Because normalize("a ") = "a " has length 2

Why LIKELY not CONFIRMED:
The binary search invariant appears correct in principle, but edge cases with collapsing normalizations could expose boundary issues. The real problem is that this function feeds into mapNormToOrigEnd, which has confirmed bugs.

Recommendation:
Add test cases specifically for boundary detection with:

Multiple consecutive collapsible characters

UTF-8 multi-byte characters adjacent to collapsible sequences

trungutt · 2026-03-19T07:48:09Z

Closed, reopen when needed

trungutt marked this pull request as ready for review March 17, 2026 11:28

trungutt requested a review from a team as a code owner March 17, 2026 11:28

docker-agent bot reviewed Mar 17, 2026

View reviewed changes

trungutt closed this Mar 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add fuzzy text matching fallbacks for edit_file tool#2145

feat: add fuzzy text matching fallbacks for edit_file tool#2145
trungutt wants to merge 1 commit intodocker:mainfrom
trungutt:feat-edit-file-fuzzy-matching

trungutt commented Mar 17, 2026 •

edited

Loading

Uh oh!

docker-agent bot left a comment

Uh oh!

docker-agent bot Mar 17, 2026

Uh oh!

docker-agent bot Mar 17, 2026

Uh oh!

docker-agent bot Mar 17, 2026

Uh oh!

trungutt commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

trungutt commented Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Uh oh!

docker-agent bot left a comment

Choose a reason for hiding this comment

Review Summary

Assessment: 🔴 CRITICAL

Critical Issues Found

Recommendation

Detailed Findings

Uh oh!

docker-agent bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

docker-agent bot Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

trungutt commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

trungutt commented Mar 17, 2026 •

edited

Loading