Fix: detect mutating keywords inside CTEs in read-only mode#275
Fix: detect mutating keywords inside CTEs in read-only mode#275
Conversation
The read-only SQL check only examined the first keyword, so queries like `WITH updated AS (UPDATE ...) SELECT ...` bypassed the restriction. Now scans the full statement (after stripping comments and strings) for DML/DDL keywords. Closes #271 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Improves the isReadOnlySQL guard to better detect write operations embedded in otherwise “allowed” statements (notably WITH/CTEs), strengthening DBHub’s read-only query enforcement.
Changes:
- Add a deny-list regex scan for mutating (DML/DDL) keywords across the full (comment/string-stripped) SQL, in addition to the first-keyword allow-list check.
- Add unit tests covering CTEs containing mutating statements and ensuring strings/comments don’t trigger false positives.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/utils/allowed-keywords.ts | Adds mutating keyword detection across the entire statement to reject hidden write operations. |
| src/utils/tests/allowed-keywords.test.ts | Adds test coverage for mutating keywords inside CTEs and for avoiding false positives in comments/strings. |
You can also share your feedback on Copilot code review. Take the survey.
REPLACE followed by ( is a string function call (e.g. SELECT REPLACE(...)), not a mutating REPLACE INTO statement. Use negative lookahead to avoid false positives. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
src/utils/allowed-keywords.ts:59
- Readonly bypass: statements like
SELECT ... INTO new_table ...(Postgres/SQL Server) orSELECT ... INTO OUTFILE ...(MySQL) start with an allowed keyword (select) but perform writes/DDL. The current mutating-keyword scan won’t catch these because it only looks for DML/DDL verbs likeinsert/update/create. If the goal is to prevent any write side-effects in readonly mode, add explicit detection forselect ... into(dialect-specific) and reject those too.
export function isReadOnlySQL(sql: string, connectorType: ConnectorType | string): boolean {
// Strip comments and strings before analyzing
const cleanedSQL = stripCommentsAndStrings(sql, connectorType as ConnectorType).trim().toLowerCase();
You can also share your feedback on Copilot code review. Take the survey.
- Only scan for mutating keywords when first keyword is WITH (fixes false rejection of SHOW CREATE TABLE in MySQL/MariaDB) - Add SELECT ... INTO detection for Postgres/MySQL/SQL Server - EXPLAIN with DML is allowed since it doesn't execute the statement - Add 6 more tests for SHOW CREATE, EXPLAIN, and SELECT INTO Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
You can also share your feedback on Copilot code review. Take the survey.
src/utils/allowed-keywords.ts
Outdated
| // Scan the full statement for mutating keywords. | ||
| if (firstWord === "with" && mutatingPattern.test(cleanedSQL)) { | ||
| return false; |
There was a problem hiding this comment.
The WITH-branch rejects any statement containing a mutating keyword anywhere in the SQL. This can incorrectly flag read-only CTE queries when a keyword appears as an identifier/alias/function name (e.g., selecting an unquoted column named replace, or ... AS replace, etc.). Consider tightening detection to only match DML/DDL when it appears in a CTE subquery position (e.g., after AS ( / , boundaries) rather than doing a whole-statement keyword scan.
| // Scan the full statement for mutating keywords. | |
| if (firstWord === "with" && mutatingPattern.test(cleanedSQL)) { | |
| return false; | |
| // Only scan the CTE subquery bodies (the parts inside AS (...)) for mutating keywords. | |
| if (firstWord === "with") { | |
| const cteBodyPattern = /\bas\s*\(([\s\S]*?)\)/gi; | |
| let match: RegExpExecArray | null; | |
| while ((match = cteBodyPattern.exec(cleanedSQL)) !== null) { | |
| const cteBody = match[1]; | |
| if (mutatingPattern.test(cteBody)) { | |
| return false; | |
| } | |
| } |
| const sql = "SELECT REPLACE(name, 'a', 'b') FROM users"; | ||
| expect(isReadOnlySQL(sql, "postgres")).toBe(true); | ||
| }); | ||
|
|
There was a problem hiding this comment.
The REPLACE() special-casing is only exercised when the statement starts with WITH (since mutatingPattern is only applied in that branch), but this test uses a plain SELECT so it doesn't cover the new behavior. Add a test like WITH cte AS (SELECT REPLACE(...) ...) SELECT ... to ensure the WITH scan doesn't false-positive on REPLACE() calls.
| it("should allow REPLACE() in a WITH CTE as a string function", () => { | |
| const sql = "WITH cte AS (SELECT REPLACE(name, 'a', 'b') AS name FROM users) SELECT * FROM cte"; | |
| expect(isReadOnlySQL(sql, "postgres")).toBe(true); | |
| }); |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
You can also share your feedback on Copilot code review. Take the survey.
| // WITH statements can embed DML in CTEs (e.g. WITH cte AS (UPDATE ...)) | ||
| // Scan the full statement for mutating keywords. | ||
| if (firstWord === "with" && mutatingPattern.test(cleanedSQL)) { | ||
| return false; | ||
| } | ||
|
|
||
| // SELECT ... INTO writes data (creates tables or writes to files) | ||
| if (firstWord === "select" && selectIntoPattern.test(cleanedSQL)) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
SELECT ... INTO is only rejected when the first keyword is select. A statement starting with WITH can still perform SELECT ... INTO in the final query (e.g. WITH t AS (...) SELECT ... INTO new_table ...), which writes data but would currently return true because the selectIntoPattern check is skipped. Consider applying the SELECT ... INTO detection to WITH statements as well (or generally whenever the cleaned SQL contains a SELECT ... INTO).
| if (!keywordList.includes(firstWord)) { | ||
| return false; | ||
| } | ||
|
|
||
| // WITH statements can embed DML in CTEs (e.g. WITH cte AS (UPDATE ...)) | ||
| // Scan the full statement for mutating keywords. | ||
| if (firstWord === "with" && mutatingPattern.test(cleanedSQL)) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
EXPLAIN is treated as read-only unconditionally, but in Postgres EXPLAIN ANALYZE (and EXPLAIN (ANALYZE ...)) executes the underlying statement. As written, EXPLAIN ANALYZE DELETE ... would be allowed in read-only mode. Consider detecting and rejecting EXPLAIN statements that enable ANALYZE (at least for Postgres), or conservatively treating any EXPLAIN containing an ANALYZE option as non-read-only.
| describe("SHOW CREATE and metadata queries", () => { | ||
| it("should allow SHOW CREATE TABLE in MySQL", () => { | ||
| expect(isReadOnlySQL("SHOW CREATE TABLE users", "mysql")).toBe(true); | ||
| }); | ||
|
|
||
| it("should allow SHOW CREATE PROCEDURE in MariaDB", () => { | ||
| expect(isReadOnlySQL("SHOW CREATE PROCEDURE my_proc", "mariadb")).toBe(true); | ||
| }); | ||
|
|
||
| it("should allow EXPLAIN with mutating statement", () => { | ||
| // EXPLAIN doesn't execute the statement, just shows the plan | ||
| expect(isReadOnlySQL("EXPLAIN DELETE FROM users", "postgres")).toBe(true); | ||
| }); | ||
| }); | ||
|
|
||
| describe("SELECT INTO", () => { | ||
| it("should reject SELECT INTO (Postgres table creation)", () => { | ||
| expect(isReadOnlySQL("SELECT * INTO new_table FROM users", "postgres")).toBe(false); | ||
| }); | ||
|
|
||
| it("should reject SELECT INTO OUTFILE (MySQL)", () => { | ||
| expect(isReadOnlySQL("SELECT * INTO OUTFILE '/tmp/data.csv' FROM users", "mysql")).toBe(false); | ||
| }); | ||
|
|
||
| it("should reject SELECT INTO with WHERE clause", () => { | ||
| expect(isReadOnlySQL("SELECT id, name INTO backup_table FROM users WHERE active = true", "sqlserver")).toBe(false); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
New read-only detection logic adds CTE scanning and SELECT ... INTO blocking, but the tests don't cover two important write-paths: (1) WITH ... SELECT ... INTO ... (first keyword with) and (2) EXPLAIN ANALYZE ... / EXPLAIN (ANALYZE ...) which executes the underlying statement in Postgres. Adding these cases would prevent regressions in readonly enforcement.
…n CTEs - WITH statements now also check for SELECT ... INTO - EXPLAIN ANALYZE with DML is rejected (Postgres executes the statement) - EXPLAIN without ANALYZE remains allowed with DML (plan only) - Fixed regex word boundary for parenthesized EXPLAIN (ANALYZE) form - Added tests for REPLACE() inside CTEs, WITH...SELECT INTO, EXPLAIN ANALYZE DELETE, EXPLAIN (ANALYZE) DELETE Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
You can also share your feedback on Copilot code review. Take the survey.
src/utils/allowed-keywords.ts
Outdated
| // Reject if it contains DML after the ANALYZE keyword | ||
| if (firstWord === "explain" && explainAnalyzePattern.test(cleanedSQL)) { | ||
| // Extract the part after EXPLAIN [ANALYZE|(...)] and check for DML | ||
| const afterExplain = cleanedSQL.replace(explainAnalyzePattern, "").trim(); | ||
| if (afterExplain && mutatingPattern.test(afterExplain)) { |
There was a problem hiding this comment.
The EXPLAIN ANALYZE path only checks mutatingPattern on the statement after EXPLAIN, but it does not reject other write-capable forms like SELECT ... INTO .... As a result, something like EXPLAIN ANALYZE SELECT * INTO new_table FROM users would be treated as read-only even though Postgres executes it and it writes/creates a table. Consider validating afterExplain using the same read-only logic as a normal statement (e.g., run isReadOnlySQL(afterExplain, connectorType) or at least apply the selectIntoPattern check as well) and add a regression test for EXPLAIN ANALYZE + SELECT INTO.
| // Reject if it contains DML after the ANALYZE keyword | |
| if (firstWord === "explain" && explainAnalyzePattern.test(cleanedSQL)) { | |
| // Extract the part after EXPLAIN [ANALYZE|(...)] and check for DML | |
| const afterExplain = cleanedSQL.replace(explainAnalyzePattern, "").trim(); | |
| if (afterExplain && mutatingPattern.test(afterExplain)) { | |
| // Reject if the statement after ANALYZE is not read-only | |
| if (firstWord === "explain" && explainAnalyzePattern.test(cleanedSQL)) { | |
| // Extract the part after EXPLAIN [ANALYZE|(...)] and validate it as a normal statement | |
| const afterExplain = cleanedSQL.replace(explainAnalyzePattern, "").trim(); | |
| if (afterExplain && !isReadOnlySQL(afterExplain, connectorType)) { |
| // WITH statements can embed DML in CTEs (e.g. WITH cte AS (UPDATE ...)) | ||
| // or use SELECT ... INTO in the final query. | ||
| if (firstWord === "with") { | ||
| if (mutatingPattern.test(cleanedSQL)) { | ||
| return false; | ||
| } | ||
| if (selectIntoPattern.test(cleanedSQL)) { | ||
| return false; | ||
| } |
There was a problem hiding this comment.
For WITH statements, mutatingPattern.test(cleanedSQL) scans the entire query text for words like drop, create, etc. This will also match those words when they appear as identifiers/aliases/function names in an otherwise read-only query (e.g., WITH cte AS (SELECT 1 AS drop) SELECT * FROM cte), causing false rejections. To avoid blocking valid read-only queries, consider narrowing the scan to statement-start positions inside CTE sub-statements / the main statement (token-based scan using the existing sql-parser), rather than a whole-text keyword regex.
Reuse isReadOnlySQL recursively on the statement after EXPLAIN ANALYZE, so SELECT INTO and other write-capable forms are also caught. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
You can also share your feedback on Copilot code review. Take the survey.
| // EXPLAIN ANALYZE actually executes the statement (Postgres) | ||
| // Validate the inner statement using the same read-only logic | ||
| if (firstWord === "explain" && explainAnalyzePattern.test(cleanedSQL)) { | ||
| const afterExplain = cleanedSQL.replace(explainAnalyzePattern, "").trim(); | ||
| if (afterExplain && !isReadOnlySQL(afterExplain, connectorType)) { | ||
| return false; | ||
| } | ||
| } |
There was a problem hiding this comment.
The EXPLAIN ANALYZE branch only checks afterExplain for mutatingPattern, so statements like EXPLAIN ANALYZE SELECT ... INTO new_table ... (or EXPLAIN (ANALYZE) WITH ... SELECT ... INTO ...) will currently be treated as read-only even though they execute and can write (table creation / outfile). Consider running the same read-only validation on afterExplain (including the SELECT...INTO check), or explicitly checking selectIntoPattern on afterExplain as well.
| it("should reject EXPLAIN ANALYZE with DML (Postgres executes the statement)", () => { | ||
| expect(isReadOnlySQL("EXPLAIN ANALYZE DELETE FROM users", "postgres")).toBe(false); | ||
| }); | ||
|
|
||
| it("should reject EXPLAIN (ANALYZE) with DML", () => { | ||
| expect(isReadOnlySQL("EXPLAIN (ANALYZE) DELETE FROM users", "postgres")).toBe(false); | ||
| }); | ||
|
|
||
| it("should allow EXPLAIN ANALYZE with SELECT", () => { | ||
| expect(isReadOnlySQL("EXPLAIN ANALYZE SELECT * FROM users", "postgres")).toBe(true); | ||
| }); |
There was a problem hiding this comment.
There isn't currently a regression test covering EXPLAIN ANALYZE SELECT ... INTO ... (or EXPLAIN (ANALYZE, ...) WITH ... SELECT ... INTO ...), which is a write-capable form that can slip through if only DML keywords are checked. Adding a test case here would help prevent reintroducing that readonly bypass.
| }); | ||
|
|
||
| it("should reject DROP inside a CTE-like construct", () => { | ||
| const sql = "WITH x AS (SELECT 1) DROP TABLE users"; |
There was a problem hiding this comment.
WITH x AS (SELECT 1) DROP TABLE users is not valid syntax in Postgres/MySQL (a top-level WITH clause can't precede DROP). As a unit test example this is a bit misleading; consider replacing it with a syntactically valid statement that still demonstrates the same risk (e.g., DDL/DML inside a CTE, or a multi-statement input handled by splitSQLStatements).
| const sql = "WITH x AS (SELECT 1) DROP TABLE users"; | |
| const sql = "WITH x AS (SELECT 1) SELECT * FROM x; DROP TABLE users"; |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
You can also share your feedback on Copilot code review. Take the survey.
Strip optional VERBOSE keyword after ANALYZE so the recursive isReadOnlySQL check receives the actual inner statement. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
You can also share your feedback on Copilot code review. Take the survey.
REPLACE INTO is a MySQL/MariaDB-specific statement. In Postgres/SQLite/ SQL Server, REPLACE is only a function, so it should not trigger the mutating keyword check. This avoids false rejections when 'replace' appears as a CTE name or alias in non-MySQL dialects. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
You can also share your feedback on Copilot code review. Take the survey.
src/utils/allowed-keywords.ts
Outdated
| * Matches both `EXPLAIN ANALYZE ...` and `EXPLAIN (ANALYZE) ...` / `EXPLAIN (ANALYZE, ...) ...` */ | ||
| const explainAnalyzePattern = /^explain\s+(?:\([^)]*\banalyze\b[^)]*\)|\banalyze\b(?:\s+verbose\b)?)/i; |
There was a problem hiding this comment.
explainAnalyzePattern treats any EXPLAIN (...) that contains the word ANALYZE as executing, but in Postgres EXPLAIN (ANALYZE false|off) does not execute the inner statement. With the current regex, EXPLAIN (ANALYZE false) DELETE ... will be rejected even though it’s equivalent (from a safety perspective) to plain EXPLAIN DELETE .... Consider tightening the detection to only match ANALYZE when it’s enabled (e.g., no explicit false/off), and add a test case for the disabled form to prevent regressions.
| * Matches both `EXPLAIN ANALYZE ...` and `EXPLAIN (ANALYZE) ...` / `EXPLAIN (ANALYZE, ...) ...` */ | |
| const explainAnalyzePattern = /^explain\s+(?:\([^)]*\banalyze\b[^)]*\)|\banalyze\b(?:\s+verbose\b)?)/i; | |
| * Matches both `EXPLAIN ANALYZE ...` and `EXPLAIN (ANALYZE) ...` / `EXPLAIN (ANALYZE, ...) ...`, | |
| * but excludes explicitly disabled forms like `ANALYZE false|off|0`. */ | |
| const explainAnalyzePattern = | |
| /^explain\s+(?:\([^)]*\banalyze\b(?!\s*(?:=\s*)?(?:false|off|0)\b)[^)]*\)|\banalyze\b(?!\s*(?:=\s*)?(?:false|off|0)\b)(?:\s+verbose\b)?)/i; |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
You can also share your feedback on Copilot code review. Take the survey.
| * 1. Strips comments and string literals before analyzing. | ||
| * 2. Verifies the first keyword is in the allow-list. | ||
| * 3. For WITH statements, scans for mutating keywords and SELECT INTO. | ||
| * 4. For SELECT statements, checks for SELECT ... INTO. | ||
| * 5. For EXPLAIN statements, rejects EXPLAIN ANALYZE with DML. |
There was a problem hiding this comment.
isReadOnlySQL still treats a standalone ANALYZE ... statement as read-only because analyze remains in the allow-list and there are no additional checks for it. ANALYZE updates statistics (writes) on Postgres/SQLite/MySQL, so this undermines the guarantees of readonly mode. Consider removing analyze from allowedKeywords (and/or adding it to the mutating keyword detection) so it is rejected in readonly mode, while still handling EXPLAIN ANALYZE via the explicit explainAnalyzePattern logic.
| it("should allow REPLACE() as a string function in SELECT", () => { | ||
| const sql = "SELECT REPLACE(name, 'a', 'b') FROM users"; | ||
| expect(isReadOnlySQL(sql, "postgres")).toBe(true); | ||
| }); |
There was a problem hiding this comment.
The MySQL/MariaDB-specific REPLACE detection uses a negative lookahead to avoid flagging REPLACE(...) function calls, but the tests only cover REPLACE() usage under the postgres connector. Add explicit test cases for mysql/mariadb that use REPLACE() inside SELECT/WITH to ensure the new regex doesn’t introduce false positives for those connectors.
- Remove ANALYZE from allowed first keywords since it updates statistics (writes to system catalogs) in Postgres/MySQL/SQLite - EXPLAIN (ANALYZE false/off/0) does not execute, so allow it with DML - Add MySQL/MariaDB REPLACE() function tests to ensure no false positives Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
You can also share your feedback on Copilot code review. Take the survey.
| const mutatingPatternMySQL = new RegExp( | ||
| `\\b(?:${mutatingKeywords.join("|")}|replace(?!\\s*\\())\\b`, | ||
| "i", | ||
| ); |
There was a problem hiding this comment.
mutatingPatternMySQL treats any standalone word replace as mutating unless it’s followed by (. That will incorrectly reject read-only queries where an identifier (e.g., CTE/table/alias) is named replace in MySQL/MariaDB (e.g., WITH replace AS (SELECT 1) SELECT * FROM replace). Consider narrowing the MySQL/MariaDB rule to match only the REPLACE statement form (e.g., REPLACE\s+(?:LOW_PRIORITY\s+|DELAYED\s+)?INTO\b) rather than any occurrence of the word.
| const sql = "WITH replace AS (SELECT 1) SELECT * FROM replace"; | ||
| expect(isReadOnlySQL(sql, "postgres")).toBe(true); | ||
| }); | ||
|
|
There was a problem hiding this comment.
The tests cover WITH replace AS ... only for Postgres, but the current MySQL/MariaDB REPLACE detection will also match identifiers named replace and reject such queries. Add a MySQL/MariaDB test case (e.g., WITH replace AS (SELECT 1) SELECT * FROM replace) to prevent this false-positive regression.
| it("should allow a CTE named 'replace' in MySQL", () => { | |
| const sql = "WITH replace AS (SELECT 1) SELECT * FROM replace"; | |
| expect(isReadOnlySQL(sql, "mysql")).toBe(true); | |
| }); |
Summary
WITH cte AS (UPDATE/INSERT/DELETE ...) SELECT ...to bypass restrictionsINSERT,UPDATE,DELETE,DROP,ALTER,CREATE,TRUNCATE,REPLACE,MERGE,GRANT,REVOKE,RENAME)Changes
src/utils/allowed-keywords.ts: Added mutating keyword detection after the first-keyword checksrc/utils/__tests__/allowed-keywords.test.ts: Added 7 test cases for CTE bypass scenariosCloses #271
Test plan
🤖 Generated with Claude Code