Skip to content

Conversation

@Hona
Copy link
Contributor

@Hona Hona commented Dec 12, 2025

rg runs on some really common string on some large search space e.g. cd / && rg /
it waits for ALL the results to finish...
It used to load all the results (imagine 1 GB of text) all into memory.
Then string split, (allocating another 1+ GB of memory).
It would also do stat calls for all results
Then finally only fetch the first 100.

Now when you do sinful rg it returns instantly and saves your memory/disk for better things.

Note: There would be a behavior regression here - it used to load all results and when truncating to 100 it is sorted by recently modified
We could do a hybrid approach here - allow streaming in 2000(?) results and return the most recent 100 out of that (killing rg) to limit worst case, but still provide a good experience for small search results.

image

Copilot AI review requested due to automatic review settings December 12, 2025 09:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves the memory efficiency of the grep tool by streaming ripgrep output instead of loading all results into memory before processing. Previously, the tool would wait for ripgrep to complete, load all output into memory, split it into lines (creating a second copy), and only then limit to 100 results. The new implementation streams output line-by-line and stops processing once 100 matches are collected, preventing memory exhaustion on large search spaces.

Key changes:

  • Implements streaming processing of ripgrep stdout using ReadableStream API
  • Adds early termination by killing the ripgrep process once MATCH_LIMIT (100) matches are collected
  • Updates exit code handling to account for killed processes and partial results

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 57 to 80
for (const line of lines) {
if (!line) continue
if (matches.length >= MATCH_LIMIT) {
hitCollectLimit = true
break
}

const [filePath, lineNumStr, ...lineTextParts] = line.split("|")
if (!filePath || !lineNumStr || lineTextParts.length === 0) continue

const lineNum = parseInt(lineNumStr, 10)
const lineText = lineTextParts.join("|")

const file = Bun.file(filePath)
const stats = await file.stat().catch(() => null)
if (!stats) continue

matches.push({
path: filePath,
modTime: stats.mtime.getTime(),
lineNum,
lineText,
})
}
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The streaming implementation processes matches asynchronously with file.stat() calls inside the read loop. If file.stat() is slow, this could cause backpressure issues where the stdout buffer fills up because the reader isn't consuming chunks fast enough. Consider processing file stats after collecting all line data, or implement a queue-based approach to handle I/O more efficiently.

Copilot uses AI. Check for mistakes.
Comment on lines 124 to 125
const truncated = matches.length > MATCH_LIMIT
const finalMatches = truncated ? matches.slice(0, MATCH_LIMIT) : matches
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable truncated is computed based on whether matches.length exceeds MATCH_LIMIT, but since the collection loop stops at MATCH_LIMIT items, matches.length will never actually be greater than MATCH_LIMIT. This means truncated will always be false, making the variable redundant. The truncation logic should rely solely on hitCollectLimit to determine if results were truncated.

Copilot uses AI. Check for mistakes.
reader.releaseLock()
}

const errorOutput = await new Response(proc.stderr).text()
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the collection limit is hit and the process is killed, the stderr might not be fully captured yet since proc.stderr is being read after the kill. This could lead to incomplete error messages. Consider reading stderr before killing the process, or at minimum be aware that error output may be incomplete when the limit is hit.

Copilot uses AI. Check for mistakes.
Comment on lines 85 to 101
if (!hitCollectLimit && buffer) {
const [filePath, lineNumStr, ...lineTextParts] = buffer.split("|")
if (filePath && lineNumStr && lineTextParts.length > 0) {
const lineNum = parseInt(lineNumStr, 10)
const lineText = lineTextParts.join("|")
const file = Bun.file(filePath)
const stats = await file.stat().catch(() => null)
if (stats) {
matches.push({
path: filePath,
modTime: stats.mtime.getTime(),
lineNum,
lineText,
})
}
}
}
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling of the last buffered line (lines 85-101) has duplication with the main loop logic (lines 64-79). This duplicated parsing and file stat logic reduces maintainability. Consider extracting the line processing into a separate function to avoid code duplication.

Copilot uses AI. Check for mistakes.
Comment on lines 70 to 72
const file = Bun.file(filePath)
const stats = await file.stat().catch(() => null)
if (!stats) continue
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling file.stat() for every single match result can be a performance bottleneck, especially when processing many results. Each stat call is an I/O operation that waits for file system access. Since ripgrep already provides the file path, consider whether the modification time is truly necessary for sorting, or if there's a way to batch or optimize these stat calls.

Copilot uses AI. Check for mistakes.
const exitCode = await proc.exited

if (exitCode === 1) {
if (exitCode === 1 && matches.length === 0) {
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the process is killed due to hitting the collection limit (line 103), the exit code will likely be non-zero (e.g., killed by signal). The error handling on line 118 accounts for this with !hitCollectLimit, but there's a potential edge case: if ripgrep legitimately fails with exit code 1 but we already collected some matches, we'll now return those matches instead of reporting the error. The condition on line 110 should verify that exit code 1 specifically means "no matches found" rather than another error condition.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +79 to +83
matches.push({
path: filePath,
lineNum: parseInt(lineNumStr, 10),
lineText: lineTextParts.join("|"),
})
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final buffer processing logic doesn't check if the match count has already reached the MATCH_LIMIT. If exactly MATCH_LIMIT matches were found in the streaming loop, this code could push a 101st match, exceeding the intended limit.

Suggested change
matches.push({
path: filePath,
lineNum: parseInt(lineNumStr, 10),
lineText: lineTextParts.join("|"),
})
if (matches.length < MATCH_LIMIT) {
matches.push({
path: filePath,
lineNum: parseInt(lineNumStr, 10),
lineText: lineTextParts.join("|"),
})
} else {
truncated = true
}

Copilot uses AI. Check for mistakes.
}

if (exitCode !== 0) {
if (exitCode !== 0 && exitCode !== 1 && !truncated) {
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the process is killed due to truncation, the exit code will likely be non-zero (typically 143 for SIGTERM or 137 for SIGKILL). The current condition checks if exitCode is not 0, not 1, and not truncated, but this doesn't account for the case where the process was killed and exits with a signal-related code. This could cause the error handler to throw incorrectly after a successful truncation. Consider checking if the process was killed before evaluating the exit code, or adjust the condition to handle signal-related exit codes.

Suggested change
if (exitCode !== 0 && exitCode !== 1 && !truncated) {
// Allow exit codes 137 (SIGKILL) and 143 (SIGTERM) if process was intentionally killed due to truncation
if (
exitCode !== 0 &&
exitCode !== 1 &&
!(truncated && (exitCode === 137 || exitCode === 143))
) {

Copilot uses AI. Check for mistakes.
}
}
} finally {
if (truncated) proc.kill()
Copy link

Copilot AI Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the process is killed due to truncation, stderr and the exit code are accessed immediately after without waiting for the process to fully terminate. The proc.kill() call is asynchronous, and accessing stderr before the process has fully exited could result in incomplete error output. Consider awaiting proc.exited in the finally block after killing the process to ensure proper cleanup.

Suggested change
if (truncated) proc.kill()
if (truncated) {
proc.kill();
await proc.exited;
}

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant