Skip to content

fix: address escape literal issue #21516#21599

Draft
SamAya21 wants to merge 1 commit intoapache:mainfrom
SamAya21:fix-escape-literals
Draft

fix: address escape literal issue #21516#21599
SamAya21 wants to merge 1 commit intoapache:mainfrom
SamAya21:fix-escape-literals

Conversation

@SamAya21
Copy link
Copy Markdown

Summary
This PR fixes Spark-compatible handling of escape sequences in SQL string literals #21516 .

The issue showed up in datafusion-spark string function behavior, but the root cause was not in soundex itself. The actual problem was that quoted SQL string literals were being converted into DataFusion literal expressions without unescaping sequences such as \t, \n, \, ', and octal escapes.

As a result, literals like '\t hello' were treated as the two characters \ and t instead of a tab character followed by hello.

What changed
This change updates SQL value handling in datafusion/sql/src/expr/value.rs so that:

regular quoted string literals are unescaped before being converted to Expr::Literal
escaped string literals follow the same unescape path
common escape sequences are supported:
\0
\b
\n
\r
\t
\Z
\
'
"
%
_
octal escapes of up to 3 digits are supported, such as \101

Why this belongs here
Although the failing behavior was observed in Spark string functions, the underlying bug was earlier in the SQL literal pipeline. parse_value(...) in value.rs was converting normal quoted strings directly with lit(s), preserving backslash escape text instead of producing the intended string value.

Fixing the issue at the value-conversion layer ensures all string functions receive the correct literal content.

Tests
Added unit tests covering:

tab, newline, and carriage return escapes
escaped quotes and backslashes
octal escapes
unknown escapes
trailing backslash behavior

Notes
While working on validation, I also ran into projection-name conflicts when selecting multiple literals that now resolve to the same final value like / and //. For SQL-level tests, this is avoided by aliasing projected literals, and updated test case with cargo insta review.

@github-actions github-actions bot added sql SQL Planner core Core DataFusion crate labels Apr 13, 2026
@kumarUjjawal
Copy link
Copy Markdown
Contributor

Few points before the actual code review:

  1. Always make sure you run the linter locally with all the tests passing
  2. Follow the PT template
  3. Please read the ai assisted pr guidelines https://datafusion.apache.org/contributor-guide/index.html#ai-assisted-contributions

@SamAya21
Copy link
Copy Markdown
Author

Sorry about that, I hadn't seen I should've ran sqllogictests. How exactly would I run the linter locally or what webpage would I find that as I see that I had forgot to add the header to a test file I had created.

@SamAya21 SamAya21 marked this pull request as draft April 15, 2026 05:16
@kumarUjjawal
Copy link
Copy Markdown
Contributor

Sorry about that, I hadn't seen I should've ran sqllogictests. How exactly would I run the linter locally or what webpage would I find that as I see that I had forgot to add the header to a test file I had created.

  1. run cargo fmt: cargo fmt --all
  2. run clippy: cargo clippy --workspace --all-features --all-targets -- --deny=warnings
  3. run sqllogictest: make -C testing/runner test-sqlite

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants