fix(db): ASCII case fold in upper/lower/ilike#1574
Conversation
…gth-changing folds
📝 WalkthroughWalkthroughThis PR updates TanStack DB's ChangesASCII-only case folding for string functions
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
packages/db/src/query/compiler/evaluators.ts (1)
17-45: ⚡ Quick winDeduplicate the ASCII fold loop to prevent divergence.
asciiToUpperandasciiToLowerare near-identical; extracting one shared helper will reduce maintenance drift risk.♻️ Proposed refactor
+function asciiFold( + s: string, + minCode: number, + maxCode: number, + delta: number, +): string { + let out = `` + for (let i = 0; i < s.length; i++) { + const code = s.charCodeAt(i) + out += + code >= minCode && code <= maxCode + ? String.fromCharCode(code + delta) + : s.charAt(i) + } + return out +} + function asciiToUpper(s: string): string { - let out = `` - for (let i = 0; i < s.length; i++) { - const c = s.charCodeAt(i) - out += c >= 0x61 && c <= 0x7a ? String.fromCharCode(c - 32) : s.charAt(i) - } - return out + return asciiFold(s, 0x61, 0x7a, -32) } @@ function asciiToLower(s: string): string { - let out = `` - for (let i = 0; i < s.length; i++) { - const c = s.charCodeAt(i) - out += c >= 0x41 && c <= 0x5a ? String.fromCharCode(c + 32) : s.charAt(i) - } - return out + return asciiFold(s, 0x41, 0x5a, 32) }As per coding guidelines, “Extract common logic into utility functions when identical or near-identical code blocks appear in multiple places.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/db/src/query/compiler/evaluators.ts` around lines 17 - 45, asciiToUpper and asciiToLower duplicate the same per-character loop; extract a single helper (e.g., asciiFold or asciiCaseFold) that iterates the string and applies a transform based on per-character checks and an offset or mapping, then reimplement asciiToUpper and asciiToLower to call that helper with the appropriate parameters (upper offset -32, lower offset +32 or a predicate/mapper). Update calls to use the new helper and keep original function names and signatures so external callers are unaffected, and ensure behavior for non-ASCII characters remains unchanged by returning the original character when outside the A-Z/a-z ranges.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@packages/db/src/query/compiler/evaluators.ts`:
- Around line 17-45: asciiToUpper and asciiToLower duplicate the same
per-character loop; extract a single helper (e.g., asciiFold or asciiCaseFold)
that iterates the string and applies a transform based on per-character checks
and an offset or mapping, then reimplement asciiToUpper and asciiToLower to call
that helper with the appropriate parameters (upper offset -32, lower offset +32
or a predicate/mapper). Update calls to use the new helper and keep original
function names and signatures so external callers are unaffected, and ensure
behavior for non-ASCII characters remains unchanged by returning the original
character when outside the A-Z/a-z ranges.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: cf6731c0-df7b-498e-b7f7-5a1df560cc1a
📒 Files selected for processing (3)
.changeset/ascii-case-fold-upper-lower-ilike.mdpackages/db/src/query/compiler/evaluators.tspackages/db/tests/query/compiler/evaluators.test.ts
Summary
upper(),lower(), andilikecurrently callString.prototype.toUpperCase()/toLowerCase()directly. Those methods are locale-aware and length-changing:.toUpperCase().toLowerCase()straßeSTRASSE(6 -> 7)straßeSTRA+ßESTRASSEstraßeİİi̇(1 -> 2, adds U+0307)fileFILE(3 -> 4)fileI(tr-TR locale)IıWhen a TanStack DB collection is fed by a server-side filter (e.g. a query collection backed by PostgreSQL or a sync engine over SQLite), the client side and server side disagree on the case-fold of the bucket key any time the data contains
ß, ligatures, Turkish dotted/dotless i, or runs under a Turkish locale. The row silently drops out of the matching set - no error, no log line.This PR uses ASCII-only case folding for
upper(),lower(), andilike, matching SQLites default behavior and making the result deterministic across JS locales and length-preserving.What changed
packages/db/src/query/compiler/evaluators.ts: addedasciiToUpper/asciiToLowerhelpers (32-codepoint shift for a-z / A-Z, no-op for everything else). Used by theupper,lowercases and byevaluateLikewhencaseInsensitive=true.packages/db/tests/query/compiler/evaluators.test.ts: added regression tests coveringß, thefiligature, Turkish dotted-i forupper/lower/ilike.Out of scope
This PR does not change
length()orconcat().length()still returnsString.prototype.length(UTF-16 code units) which diverges from SQLites code-point count for non-BMP characters - if maintainers want, Im happy to ship a follow-up. The case-fold issue has the highest visible impact because it affects the common case (Western-European data withß/fi, Turkishİ/I) where users do not expect length changes.Context
This is the same fix class behind powersync-ja/powersync-service#663 (open) and the four merged sync-rules fixes #644-#647. Reference open-source checker for this bug shape across JS DB query layers: https://github.com/sravan27/silentdrop
All 99 tests in
evaluators.test.tspass locally.Summary by CodeRabbit
Bug Fixes
upper(),lower(), andilikefunctions now use ASCII-only case folding instead of locale-aware case folding, aligning with SQLite behavior and resolving inconsistencies that previously caused silent row drops when non-ASCII characters were involved.Tests