fix: address escape literal issue #21516#21599
Draft
SamAya21 wants to merge 1 commit intoapache:mainfrom
Draft
Conversation
Contributor
|
Few points before the actual code review:
|
Author
|
Sorry about that, I hadn't seen I should've ran sqllogictests. How exactly would I run the linter locally or what webpage would I find that as I see that I had forgot to add the header to a test file I had created. |
Contributor
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes Spark-compatible handling of escape sequences in SQL string literals #21516 .
The issue showed up in datafusion-spark string function behavior, but the root cause was not in soundex itself. The actual problem was that quoted SQL string literals were being converted into DataFusion literal expressions without unescaping sequences such as \t, \n, \, ', and octal escapes.
As a result, literals like '\t hello' were treated as the two characters \ and t instead of a tab character followed by hello.
What changed
This change updates SQL value handling in datafusion/sql/src/expr/value.rs so that:
regular quoted string literals are unescaped before being converted to Expr::Literal
escaped string literals follow the same unescape path
common escape sequences are supported:
\0
\b
\n
\r
\t
\Z
\
'
"
%
_
octal escapes of up to 3 digits are supported, such as \101
Why this belongs here
Although the failing behavior was observed in Spark string functions, the underlying bug was earlier in the SQL literal pipeline. parse_value(...) in value.rs was converting normal quoted strings directly with lit(s), preserving backslash escape text instead of producing the intended string value.
Fixing the issue at the value-conversion layer ensures all string functions receive the correct literal content.
Tests
Added unit tests covering:
tab, newline, and carriage return escapes
escaped quotes and backslashes
octal escapes
unknown escapes
trailing backslash behavior
Notes
While working on validation, I also ran into projection-name conflicts when selecting multiple literals that now resolve to the same final value like / and //. For SQL-level tests, this is avoided by aliasing projected literals, and updated test case with cargo insta review.