fix(grep): stream ripgrep output to prevent memory exhaustion #5432

Hona · 2025-12-12T09:59:11Z

rg runs on some really common string on some large search space e.g. cd / && rg /
it waits for ALL the results to finish...
It used to load all the results (imagine 1 GB of text) all into memory.
Then string split, (allocating another 1+ GB of memory).
It would also do stat calls for all results
Then finally only fetch the first 100.

Now when you do sinful rg it returns instantly and saves your memory/disk for better things.

Note: There would be a behavior regression here - it used to load all results and when truncating to 100 it is sorted by recently modified
We could do a hybrid approach here - allow streaming in 2000(?) results and return the most recent 100 out of that (killing rg) to limit worst case, but still provide a good experience for small search results.

Copilot

Pull request overview

This PR improves the memory efficiency of the grep tool by streaming ripgrep output instead of loading all results into memory before processing. Previously, the tool would wait for ripgrep to complete, load all output into memory, split it into lines (creating a second copy), and only then limit to 100 results. The new implementation streams output line-by-line and stops processing once 100 matches are collected, preventing memory exhaustion on large search spaces.

Key changes:

Implements streaming processing of ripgrep stdout using ReadableStream API
Adds early termination by killing the ripgrep process once MATCH_LIMIT (100) matches are collected
Updates exit code handling to account for killed processes and partial results

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-12T10:04:19Z

packages/opencode/src/tool/grep.ts

+        for (const line of lines) {
+          if (!line) continue
+          if (matches.length >= MATCH_LIMIT) {
+            hitCollectLimit = true
+            break
+          }
+
+          const [filePath, lineNumStr, ...lineTextParts] = line.split("|")
+          if (!filePath || !lineNumStr || lineTextParts.length === 0) continue
+
+          const lineNum = parseInt(lineNumStr, 10)
+          const lineText = lineTextParts.join("|")
+
+          const file = Bun.file(filePath)
+          const stats = await file.stat().catch(() => null)
+          if (!stats) continue
+
+          matches.push({
+            path: filePath,
+            modTime: stats.mtime.getTime(),
+            lineNum,
+            lineText,
+          })
+        }


The streaming implementation processes matches asynchronously with file.stat() calls inside the read loop. If file.stat() is slow, this could cause backpressure issues where the stdout buffer fills up because the reader isn't consuming chunks fast enough. Consider processing file stats after collecting all line data, or implement a queue-based approach to handle I/O more efficiently.

Copilot · 2025-12-12T10:04:20Z

packages/opencode/src/tool/grep.ts

+    const truncated = matches.length > MATCH_LIMIT
+    const finalMatches = truncated ? matches.slice(0, MATCH_LIMIT) : matches


The variable truncated is computed based on whether matches.length exceeds MATCH_LIMIT, but since the collection loop stops at MATCH_LIMIT items, matches.length will never actually be greater than MATCH_LIMIT. This means truncated will always be false, making the variable redundant. The truncation logic should rely solely on hitCollectLimit to determine if results were truncated.

Copilot · 2025-12-12T10:04:20Z

packages/opencode/src/tool/grep.ts

+      reader.releaseLock()
+    }
+
    const errorOutput = await new Response(proc.stderr).text()


When the collection limit is hit and the process is killed, the stderr might not be fully captured yet since proc.stderr is being read after the kill. This could lead to incomplete error messages. Consider reading stderr before killing the process, or at minimum be aware that error output may be incomplete when the limit is hit.

Copilot · 2025-12-12T10:04:20Z

packages/opencode/src/tool/grep.ts

+      if (!hitCollectLimit && buffer) {
+        const [filePath, lineNumStr, ...lineTextParts] = buffer.split("|")
+        if (filePath && lineNumStr && lineTextParts.length > 0) {
+          const lineNum = parseInt(lineNumStr, 10)
+          const lineText = lineTextParts.join("|")
+          const file = Bun.file(filePath)
+          const stats = await file.stat().catch(() => null)
+          if (stats) {
+            matches.push({
+              path: filePath,
+              modTime: stats.mtime.getTime(),
+              lineNum,
+              lineText,
+            })
+          }
+        }
+      }


The handling of the last buffered line (lines 85-101) has duplication with the main loop logic (lines 64-79). This duplicated parsing and file stat logic reduces maintainability. Consider extracting the line processing into a separate function to avoid code duplication.

Copilot · 2025-12-12T10:04:20Z

packages/opencode/src/tool/grep.ts

+          const file = Bun.file(filePath)
+          const stats = await file.stat().catch(() => null)
+          if (!stats) continue


Calling file.stat() for every single match result can be a performance bottleneck, especially when processing many results. Each stat call is an I/O operation that waits for file system access. Since ripgrep already provides the file path, consider whether the modification time is truly necessary for sorting, or if there's a way to batch or optimize these stat calls.

Copilot · 2025-12-12T10:04:21Z

packages/opencode/src/tool/grep.ts

    const exitCode = await proc.exited

-    if (exitCode === 1) {
+    if (exitCode === 1 && matches.length === 0) {


When the process is killed due to hitting the collection limit (line 103), the exit code will likely be non-zero (e.g., killed by signal). The error handling on line 118 accounts for this with !hitCollectLimit, but there's a potential edge case: if ripgrep legitimately fails with exit code 1 but we already collected some matches, we'll now return those matches instead of reporting the error. The condition on line 110 should verify that exit code 1 specifically means "no matches found" rather than another error condition.

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-12T10:30:28Z

packages/opencode/src/tool/grep.ts

+          matches.push({
+            path: filePath,
+            lineNum: parseInt(lineNumStr, 10),
+            lineText: lineTextParts.join("|"),
+          })


The final buffer processing logic doesn't check if the match count has already reached the MATCH_LIMIT. If exactly MATCH_LIMIT matches were found in the streaming loop, this code could push a 101st match, exceeding the intended limit.

Suggested change

matches.push({

path: filePath,

lineNum: parseInt(lineNumStr, 10),

lineText: lineTextParts.join("|"),

})

if (matches.length < MATCH_LIMIT) {

matches.push({

path: filePath,

lineNum: parseInt(lineNumStr, 10),

lineText: lineTextParts.join("|"),

})

} else {

truncated = true

}

Copilot · 2025-12-12T10:30:28Z

packages/opencode/src/tool/grep.ts

    }

-    if (exitCode !== 0) {
+    if (exitCode !== 0 && exitCode !== 1 && !truncated) {


When the process is killed due to truncation, the exit code will likely be non-zero (typically 143 for SIGTERM or 137 for SIGKILL). The current condition checks if exitCode is not 0, not 1, and not truncated, but this doesn't account for the case where the process was killed and exits with a signal-related code. This could cause the error handler to throw incorrectly after a successful truncation. Consider checking if the process was killed before evaluating the exit code, or adjust the condition to handle signal-related exit codes.

Suggested change

if (exitCode !== 0 && exitCode !== 1 && !truncated) {

// Allow exit codes 137 (SIGKILL) and 143 (SIGTERM) if process was intentionally killed due to truncation

if (

exitCode !== 0 &&

exitCode !== 1 &&

!(truncated && (exitCode === 137 || exitCode === 143))

) {

Copilot · 2025-12-12T10:30:28Z

packages/opencode/src/tool/grep.ts

+        }
+      }
+    } finally {
+      if (truncated) proc.kill()


When the process is killed due to truncation, stderr and the exit code are accessed immediately after without waiting for the process to fully terminate. The proc.kill() call is asynchronous, and accessing stderr before the process has fully exited could result in incomplete error output. Consider awaiting proc.exited in the finally block after killing the process to ensure proper cleanup.

Suggested change

if (truncated) proc.kill()

if (truncated) {

proc.kill();

await proc.exited;

}

fix(grep): stream ripgrep output to prevent memory exhaustion

52e8688

Copilot AI review requested due to automatic review settings December 12, 2025 09:59

Copilot started reviewing on behalf of Hona December 12, 2025 09:59 View session

Copilot AI reviewed Dec 12, 2025

View reviewed changes

clean

5a08e68

Hona requested a review from Copilot December 12, 2025 10:27

Copilot started reviewing on behalf of Hona December 12, 2025 10:27 View session

Copilot AI reviewed Dec 12, 2025

View reviewed changes

anntnzrb mentioned this pull request Dec 25, 2025

High CPU usage (100%+) during LLM streaming in long sessions due to O(n) text buffer rendering #6172

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(grep): stream ripgrep output to prevent memory exhaustion #5432

fix(grep): stream ripgrep output to prevent memory exhaustion #5432

Hona commented Dec 12, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Copilot AI Dec 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		const truncated = matches.length > MATCH_LIMIT
		const finalMatches = truncated ? matches.slice(0, MATCH_LIMIT) : matches

-      if (truncated) proc.kill()
+      if (truncated) {
+        proc.kill();
+        await proc.exited;
+      }

fix(grep): stream ripgrep output to prevent memory exhaustion #5432

Are you sure you want to change the base?

fix(grep): stream ripgrep output to prevent memory exhaustion #5432

Conversation

Hona commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Hona commented Dec 12, 2025 •

edited

Loading