Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## [Unreleased]

### Fixed
- Fixed `filterByRepos` and `filterByFilepaths` returning no results for repository names containing dots, dashes, or slashes by replacing `escape-string-regexp` (which produces `\xNN` hex escapes incompatible with Zoekt's RE2 engine) with a RE2-compatible escaper. [#1004](https://github.com/sourcebot-dev/sourcebot/pull/1004)

### Added
- Added AGENTS.md with Cursor Cloud development environment instructions. [#1001](https://github.com/sourcebot-dev/sourcebot/pull/1001)
- Added support for configuring SMTP via individual environment variables (SMTP_HOST, SMTP_PORT, SMTP_USERNAME, SMTP_PASSWORD) as an alternative to SMTP_CONNECTION_URL. [#1002](https://github.com/sourcebot-dev/sourcebot/pull/1002)
Expand Down
8 changes: 5 additions & 3 deletions packages/mcp/src/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { StdioServerTransport } from '@modelcontextprotocol/sdk/server/stdio.js';
import _dedent from "dedent";
import escapeStringRegexp from 'escape-string-regexp';
// Escapes special RE2 regex characters using backslash (compatible with Zoekt/Go RE2).
// escape-string-regexp v5 uses \xNN hex escapes which RE2 does not support.
const escapeRE2 = (s: string) => s.replace(/[.+*?^${}[\]|(\\]/g, '\\$&');
Comment on lines +7 to +9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n packages/mcp/src/index.ts | head -20

Repository: sourcebot-dev/sourcebot

Length of output: 1496


🏁 Script executed:

# Check if the file exists and examine the context around lines 7-9
wc -l packages/mcp/src/index.ts

Repository: sourcebot-dev/sourcebot

Length of output: 96


🏁 Script executed:

# Search for all occurrences of escapeRE2 in the codebase
rg "escapeRE2" -n --context 2

Repository: sourcebot-dev/sourcebot

Length of output: 4830


🏁 Script executed:

# Also check if there are similar regex patterns elsewhere
rg "replace\(/\[" --context 3 -t ts -t tsx -t js -t jsx

Repository: sourcebot-dev/sourcebot

Length of output: 94


🏁 Script executed:

# Search for any comments or documentation about RE2 special characters in the codebase
rg "RE2|special.*character" -i --context 3 -t ts -t tsx -t js

Repository: sourcebot-dev/sourcebot

Length of output: 94


🌐 Web query:

RE2 regular expression special characters that need escaping

💡 Result:

In RE2, you escape a metacharacter by prefixing it with a backslash (e.g., \+ to match a literal +). [1]

Metacharacters you typically need to escape (to match literally)

  • . ^ $ * + ? ( ) [ { \ | (and, in general, other punctuation when used as operators) [2]

RE2 also supports escaping “any punctuation character” with \ (e.g., \*), and you can escape a whole literal chunk with \Q...\E. [3]

Inside a character class ([...])

You generally only need to escape (or place carefully):

  • \ ] - ^ (when ^ would otherwise negate the class) [2]

Sources: [1] [2] [3]


escapeRE2 is missing closing-parenthesis escaping.

Line 9's regex pattern fails to escape ), which is a special character in RE2. This can cause query filters to fail or parse incorrectly when repo IDs or filepaths contain closing parentheses.

Proposed fix
-const escapeRE2 = (s: string) => s.replace(/[.+*?^${}[\]|(\\]/g, '\\$&');
+const escapeRE2 = (s: string) => s.replace(/[\\.^$|?*+()[\]{}]/g, '\\$&');
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Escapes special RE2 regex characters using backslash (compatible with Zoekt/Go RE2).
// escape-string-regexp v5 uses \xNN hex escapes which RE2 does not support.
const escapeRE2 = (s: string) => s.replace(/[.+*?^${}[\]|(\\]/g, '\\$&');
// Escapes special RE2 regex characters using backslash (compatible with Zoekt/Go RE2).
// escape-string-regexp v5 uses \xNN hex escapes which RE2 does not support.
const escapeRE2 = (s: string) => s.replace(/[\\.^$|?*+()[\]{}]/g, '\\$&');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/mcp/src/index.ts` around lines 7 - 9, The escapeRE2 function's
character class is missing the closing-parenthesis character, so update the
regex in escapeRE2 (const escapeRE2 = (s: string) =>
s.replace(/[.+*?^${}[\]|(\\]/g, '\\$&');) to also include ')' so closing
parentheses are escaped; modify the character class to add ')' (ensuring proper
escaping/ordering inside the [...]) so the function correctly backslash-escapes
')' for RE2-compatible patterns.

import { z } from 'zod';
import { askCodebase, getFileSource, listCommits, listLanguageModels, listRepos, listTree, search } from './client.js';
import { env, numberSchema } from './env.js';
Expand Down Expand Up @@ -81,15 +83,15 @@ server.tool(
useRegex = false,
}) => {
if (repos.length > 0) {
query += ` (repo:${repos.map(id => escapeStringRegexp(id)).join(' or repo:')})`;
query += ` (repo:${repos.map(id => escapeRE2(id)).join(' or repo:')})`;
}

if (languages.length > 0) {
query += ` (lang:${languages.join(' or lang:')})`;
}

if (filepaths.length > 0) {
query += ` (file:${filepaths.map(filepath => escapeStringRegexp(filepath)).join(' or file:')})`;
query += ` (file:${filepaths.map(filepath => escapeRE2(filepath)).join(' or file:')})`;
}

if (ref) {
Expand Down
6 changes: 4 additions & 2 deletions packages/web/src/app/[domain]/browse/layoutClient.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ import { useBrowseParams } from "./hooks/useBrowseParams";
import { FileSearchCommandDialog } from "./components/fileSearchCommandDialog";
import { useDomain } from "@/hooks/useDomain";
import { SearchBar } from "../components/searchBar";
import escapeStringRegexp from "escape-string-regexp";
// Escapes special RE2 regex characters using backslash (compatible with Zoekt/Go RE2).
// escape-string-regexp v5 uses \xNN hex escapes which RE2 does not support.
const escapeRE2 = (s: string) => s.replace(/[.+*?^${}[\]|(\\]/g, '\\$&');
Comment on lines +13 to +15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

file="packages/web/src/app/[domain]/browse/layoutClient.tsx"
rg -n "const escapeRE2" "$file"

python - <<'PY'
from pathlib import Path
import sys

file = Path("packages/web/src/app/[domain]/browse/layoutClient.tsx")
line = next((l for l in file.read_text().splitlines() if "const escapeRE2" in l), None)
if not line:
    print("escapeRE2 declaration not found")
    sys.exit(1)

# Pull content inside /[ ... ]/g for quick static validation
try:
    char_class = line.split("/[", 1)[1].split("]/g", 1)[0]
except Exception:
    print("Could not parse regex char class from:", line)
    sys.exit(1)

print("escapeRE2 char class:", char_class)
required = [".", "+", "*", "?", "^", "$", "{", "}", "[", "]", "|", "(", ")", "\\\\"]
missing = [c for c in required if c not in char_class]
print("missing:", missing)
sys.exit(1 if missing else 0)
PY

Repository: sourcebot-dev/sourcebot

Length of output: 196


🏁 Script executed:

# Check the file structure and verify the patch suggestion
cat -n packages/web/src/app/[domain]/browse/layoutClient.tsx | head -50

# Also check if escapeRE2 is used anywhere else in the codebase
rg "escapeRE2" --type ts --type tsx

Repository: sourcebot-dev/sourcebot

Length of output: 2431


🏁 Script executed:

# Search for escapeRE2 usage with a simpler approach
rg "escapeRE2"

# Verify the suggested patch - check if all RE2 metacharacters are covered
python - <<'PY'
import re

# Current regex pattern from line 15
current = r"[.+*?^${}[\]|(\\]"
# Suggested patch
suggested = r"[\\.^$|?*+()[\]{}]"

# All RE2 regex metacharacters that need escaping
re2_metacharacters = {'.', '+', '*', '?', '^', '$', '{', '}', '[', ']', '|', '(', ')', '\\'}

print("=== Current Pattern Analysis ===")
print(f"Pattern: {current}")
current_chars = set()
# Parse the character class manually since it's complex
for c in '.+*?^${}[|(\\':
    if c in current:
        current_chars.add(c)
current_chars.add(']')  # ] is in the class as \]
print(f"Escapes: {sorted(current_chars)}")
current_missing = re2_metacharacters - current_chars
print(f"Missing: {current_missing}")

print("\n=== Suggested Pattern Analysis ===")
print(f"Pattern: {suggested}")
suggested_chars = set()
for c in '.^$|?*+()[]{\\':
    if c in suggested:
        suggested_chars.add(c)
print(f"Escapes: {sorted(suggested_chars)}")
suggested_missing = re2_metacharacters - suggested_chars
print(f"Missing: {suggested_missing}")

print(f"\n=== Verdict ===")
print(f"Current pattern missing: {current_missing}")
print(f"Suggested patch fixes issue: {')' in suggested_chars}")
PY

Repository: sourcebot-dev/sourcebot

Length of output: 1838


Escape set is incomplete: ) is not escaped in escapeRE2, affecting 4 files.

The character class on line 15 escapes ( but not ). This breaks literal matching for any repo name or file path containing ). The same incomplete function is duplicated across the codebase:

  • packages/web/src/app/[domain]/browse/layoutClient.tsx:15 (repo escaping)
  • packages/web/src/features/mcp/server.ts (repo and file escaping)
  • packages/web/src/features/chat/tools.ts (repo and file escaping)
  • packages/mcp/src/index.ts (repo and file escaping)

All instances must be fixed consistently. Apply the corrected pattern to all locations:

Suggested patch
-const escapeRE2 = (s: string) => s.replace(/[.+*?^${}[\]|(\\]/g, '\\$&');
+const escapeRE2 = (s: string) => s.replace(/[\\.^$|?*+()[\]{}]/g, '\\$&');

Consider extracting this function to a shared utility module to prevent duplication and ensure consistency across the codebase.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Escapes special RE2 regex characters using backslash (compatible with Zoekt/Go RE2).
// escape-string-regexp v5 uses \xNN hex escapes which RE2 does not support.
const escapeRE2 = (s: string) => s.replace(/[.+*?^${}[\]|(\\]/g, '\\$&');
// Escapes special RE2 regex characters using backslash (compatible with Zoekt/Go RE2).
// escape-string-regexp v5 uses \xNN hex escapes which RE2 does not support.
const escapeRE2 = (s: string) => s.replace(/[\\.^$|?*+()[\]{}]/g, '\\$&');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/web/src/app/`[domain]/browse/layoutClient.tsx around lines 13 - 15,
The regex escape function escapeRE2 is missing the closing parenthesis in its
character class, so update the pattern in every duplicated implementation
(escapeRE2) to include ')' among the escaped characters (i.e., add ')' into the
character class) so literal ')' in repo/file names are correctly escaped; after
patching each occurrence (in the functions named escapeRE2 across the codebase)
consolidate these duplicates by moving the corrected escapeRE2 into a single
shared utility module and import/use that utility where the function was
previously duplicated.

import { Session } from "next-auth";

interface LayoutProps {
Expand All @@ -37,7 +39,7 @@ export function LayoutClient({
<SearchBar
size="sm"
defaults={{
query: `repo:^${escapeStringRegexp(repoName)}$${revisionName ? ` rev:${revisionName}` : ''} `,
query: `repo:^${escapeRE2(repoName)}$${revisionName ? ` rev:${revisionName}` : ''} `,
}}
className="w-full"
isSearchAssistSupported={isSearchAssistSupported}
Expand Down
8 changes: 5 additions & 3 deletions packages/web/src/features/chat/tools.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ import { toolNames } from "./constants";
import { listReposQueryParamsSchema } from "@/lib/schemas";
import { ListReposQueryParams } from "@/lib/types";
import { listRepos } from "@/app/api/(server)/repos/listReposApi";
import escapeStringRegexp from "escape-string-regexp";
// Escapes special RE2 regex characters using backslash (compatible with Zoekt/Go RE2).
// escape-string-regexp v5 uses \xNN hex escapes which RE2 does not support.
const escapeRE2 = (s: string) => s.replace(/[.+*?^${}[\]|(\\]/g, '\\$&');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For RE2-compatible regex escaping (used from JavaScript), which metacharacters must be escaped for literal matching outside character classes, and does this include both '(' and ')'?

💡 Result:

Outside character classes, RE2 treats these as metacharacters (i.e., not literal unless escaped): \ (escape), . (any char), ^ / $ (anchors), * + ? (quantifiers), | (alternation), ( ) (grouping), [ (starts a character class), and { (starts a repetition like {n,m}). To match any of these literally, prefix with \ (and in a JavaScript string literal you’d write that as \\). [1][2]

Yes—this includes both ( and ); they are grouping metacharacters, so use \( and \) (JS: "\\(" and "\\)"). [1]

Sources: [1] RE2 syntax (google/re2 wiki). [2] RE2 syntax reference (CRAN mirror).


escapeRE2 misses ) escaping, breaking repos/files with closing parentheses.

The regex character class escapes ( but not ). Since both are metacharacters in RE2, values containing ) will produce invalid or unintended patterns. The fix is to include both grouping characters.

Suggested fix
-const escapeRE2 = (s: string) => s.replace(/[.+*?^${}[\]|(\\]/g, '\\$&');
+const escapeRE2 = (s: string) => s.replace(/[.+*?^${}()|[\]\\]/g, '\\$&');
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const escapeRE2 = (s: string) => s.replace(/[.+*?^${}[\]|(\\]/g, '\\$&');
const escapeRE2 = (s: string) => s.replace(/[.+*?^${}()|[\]\\]/g, '\\$&');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/web/src/features/chat/tools.ts` at line 14, escapeRE2's character
class currently omits the closing parenthesis, so strings containing ')' aren't
properly escaped; update the escapeRE2 function's regex (the char class in the
replace call inside escapeRE2) to include ')' alongside '(' so both grouping
metacharacters are escaped, ensuring the replace call escapes all RE2
metacharacters correctly.


// @NOTE: When adding a new tool, follow these steps:
// 1. Add the tool to the `toolNames` constant in `constants.ts`.
Expand Down Expand Up @@ -198,15 +200,15 @@ export const createCodeSearchTool = (selectedRepos: string[]) => tool({
}

if (repos.length > 0) {
query += ` (repo:${repos.map(id => escapeStringRegexp(id)).join(' or repo:')})`;
query += ` (repo:${repos.map(id => escapeRE2(id)).join(' or repo:')})`;
}

if (languages.length > 0) {
query += ` (lang:${languages.join(' or lang:')})`;
}

if (filepaths.length > 0) {
query += ` (file:${filepaths.map(filepath => escapeStringRegexp(filepath)).join(' or file:')})`;
query += ` (file:${filepaths.map(filepath => escapeRE2(filepath)).join(' or file:')})`;
}

if (ref) {
Expand Down
8 changes: 5 additions & 3 deletions packages/web/src/features/mcp/server.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@ import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { ChatVisibility } from '@sourcebot/db';
import { SOURCEBOT_VERSION } from '@sourcebot/shared';
import _dedent from 'dedent';
import escapeStringRegexp from 'escape-string-regexp';
// Escapes special RE2 regex characters using backslash (compatible with Zoekt/Go RE2).
// escape-string-regexp v5 uses \xNN hex escapes which RE2 does not support.
const escapeRE2 = (s: string) => s.replace(/[.+*?^${}[\]|(\\]/g, '\\$&');
Comment on lines +14 to +16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n packages/web/src/features/mcp/server.ts | head -30

Repository: sourcebot-dev/sourcebot

Length of output: 1532


🏁 Script executed:

cat -n packages/web/src/features/mcp/server.ts | sed -n '140,160p'

Repository: sourcebot-dev/sourcebot

Length of output: 958


🏁 Script executed:

rg "escapeRE2" packages/web/src/features/mcp/server.ts -B 2 -A 2

Repository: sourcebot-dev/sourcebot

Length of output: 722


🌐 Web query:

RE2 regex metacharacters parentheses escaping Go

💡 Result:

In Go, the regexp package uses RE2 syntax, where parentheses ( and ) are metacharacters for grouping/capturing. To match literal parentheses, escape them as \( and \). [2]

Because Go string literals also treat \ as an escape (in double-quoted strings), you typically need double escaping:

regexp.MustCompile("\\(\\d+,\\d+\\)") // interpreted string literal

Or avoid double escaping by using a raw string literal (backticks):

regexp.MustCompile(`\(\d+,\d+\)`) // raw string literal

If you need to safely insert arbitrary text (including parentheses) into a regex, use regexp.QuoteMeta, which escapes all regex metacharacters for you. [1]

Sources: Go regexp docs (QuoteMeta). [1] RE2 syntax reference. [2] Go escaping-parentheses discussion/examples. [3]

[1] (pkg.go.dev)
[2] (github.com)
[3] (stackoverflow.com)

Citations:


🏁 Script executed:

rg "escapeRE2|filterByRepos|filterByFilepaths" packages/web/src/features/mcp/ -B 1 -A 1

Repository: sourcebot-dev/sourcebot

Length of output: 2149


🏁 Script executed:

find packages -name "*.test.ts" -o -name "*.test.tsx" -o -name "*.spec.ts" | xargs grep -l "escapeRE2\|server.ts" 2>/dev/null | head -5

Repository: sourcebot-dev/sourcebot

Length of output: 49


Escape ) in escapeRE2 as well.

The current pattern at line 16 escapes ( but not ). Both are metacharacters in RE2 used for grouping/capturing. Unescaped ) in repo names or filepaths will break regex pattern matching in Zoekt queries at lines 146 and 152.

Proposed fix
-const escapeRE2 = (s: string) => s.replace(/[.+*?^${}[\]|(\\]/g, '\\$&');
+const escapeRE2 = (s: string) => s.replace(/[\\.^$|?*+()[\]{}]/g, '\\$&');
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Escapes special RE2 regex characters using backslash (compatible with Zoekt/Go RE2).
// escape-string-regexp v5 uses \xNN hex escapes which RE2 does not support.
const escapeRE2 = (s: string) => s.replace(/[.+*?^${}[\]|(\\]/g, '\\$&');
// Escapes special RE2 regex characters using backslash (compatible with Zoekt/Go RE2).
// escape-string-regexp v5 uses \xNN hex escapes which RE2 does not support.
const escapeRE2 = (s: string) => s.replace(/[\\.^$|?*+()[\]{}]/g, '\\$&');
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/web/src/features/mcp/server.ts` around lines 14 - 16, The regex in
escapeRE2 currently escapes "(" but omits ")", so add the closing parenthesis to
the character class used in the replace so that escapeRE2(s: string) properly
escapes both parentheses; update the character class in the escapeRE2 function
to include ')' so Zoekt queries that use the escaped repo/file patterns (used
where escapeRE2 is invoked around the Zoekt query construction) won't break at
the locations where those patterns are used.

import { z } from 'zod';
import {
ListTreeEntry,
Expand Down Expand Up @@ -141,13 +143,13 @@ export function createMcpServer(): McpServer {
maxTokens?: number;
}) => {
if (repos.length > 0) {
query += ` (repo:${repos.map(id => escapeStringRegexp(id)).join(' or repo:')})`;
query += ` (repo:${repos.map(id => escapeRE2(id)).join(' or repo:')})`;
}
if (languages.length > 0) {
query += ` (lang:${languages.join(' or lang:')})`;
}
if (filepaths.length > 0) {
query += ` (file:${filepaths.map(fp => escapeStringRegexp(fp)).join(' or file:')})`;
query += ` (file:${filepaths.map(fp => escapeRE2(fp)).join(' or file:')})`;
}
if (ref) {
query += ` ( rev:${ref} )`;
Expand Down
Loading