fix(db): ASCII case fold in upper/lower/ilike by sravan27 · Pull Request #1574 · TanStack/db

sravan27 · 2026-06-07T08:51:51Z

Summary

upper(), lower(), and ilike currently call String.prototype.toUpperCase() / toLowerCase() directly. Those methods are locale-aware and length-changing:

input	`.toUpperCase()`	`.toLowerCase()`
`straße`	`STRASSE` (6 -> 7)	`straße`
`STRA+ßE`	`STRASSE`	`straße`
`İ`	`İ`	`i̇` (1 -> 2, adds U+0307)
`ﬁle`	`FILE` (3 -> 4)	`ﬁle`
`I` (tr-TR locale)	`I`	`ı`

When a TanStack DB collection is fed by a server-side filter (e.g. a query collection backed by PostgreSQL or a sync engine over SQLite), the client side and server side disagree on the case-fold of the bucket key any time the data contains ß, ligatures, Turkish dotted/dotless i, or runs under a Turkish locale. The row silently drops out of the matching set - no error, no log line.

This PR uses ASCII-only case folding for upper(), lower(), and ilike, matching SQLites default behavior and making the result deterministic across JS locales and length-preserving.

What changed

packages/db/src/query/compiler/evaluators.ts: added asciiToUpper / asciiToLower helpers (32-codepoint shift for a-z / A-Z, no-op for everything else). Used by the upper, lower cases and by evaluateLike when caseInsensitive=true.
packages/db/tests/query/compiler/evaluators.test.ts: added regression tests covering ß, the ﬁ ligature, Turkish dotted-i for upper/lower/ilike.
changeset added (patch).

Out of scope

This PR does not change length() or concat(). length() still returns String.prototype.length (UTF-16 code units) which diverges from SQLites code-point count for non-BMP characters - if maintainers want, Im happy to ship a follow-up. The case-fold issue has the highest visible impact because it affects the common case (Western-European data with ß / ﬁ, Turkish İ/I) where users do not expect length changes.

Context

This is the same fix class behind powersync-ja/powersync-service#663 (open) and the four merged sync-rules fixes #644-#647. Reference open-source checker for this bug shape across JS DB query layers: https://github.com/sravan27/silentdrop

All 99 tests in evaluators.test.ts pass locally.

Summary by CodeRabbit

Bug Fixes
- upper(), lower(), and ilike functions now use ASCII-only case folding instead of locale-aware case folding, aligning with SQLite behavior and resolving inconsistencies that previously caused silent row drops when non-ASCII characters were involved.
Tests
- Added comprehensive test coverage for ASCII-only case-folding behavior with various non-ASCII character scenarios.

…gth-changing folds

coderabbitai · 2026-06-07T08:52:03Z

📝 Walkthrough

Walkthrough

This PR updates TanStack DB's upper(), lower(), and ilike() string functions to use ASCII-only case folding instead of JavaScript's locale-aware variants, ensuring results match SQLite's behavior and preventing silent data loss when non-ASCII characters differ in how they fold across JS and SQL.

Changes

ASCII-only case folding for string functions

Layer / File(s)	Summary
ASCII case folding helper functions `packages/db/src/query/compiler/evaluators.ts`	New `asciiToUpper` and `asciiToLower` internal helpers provide ASCII-only, length-preserving case folding as a foundation for all updated string functions.
Upper and lower function evaluators `packages/db/src/query/compiler/evaluators.ts`	The compiled `upper()` and `lower()` evaluators now use `asciiToUpper`/`asciiToLower` instead of `toUpperCase()`/`toLowerCase()` for string inputs.
Case-insensitive LIKE/ILIKE pattern matching `packages/db/src/query/compiler/evaluators.ts`	The LIKE/ILIKE case-insensitive path converts both search values and patterns using `asciiToLower` for deterministic, locale-independent matching.
ASCII case folding test coverage `packages/db/tests/query/compiler/evaluators.test.ts`	New test cases validate ASCII-only, non-length-changing folding for `upper()`, `lower()`, and `ilike()` with characters like German sharp s, Turkish dotted i, and fi ligature.
Changelog entry `.changeset/ascii-case-fold-upper-lower-ilike.md`	Documents the behavior change, examples of the new ASCII-only folding, and notes that users can opt into Unicode-aware folding via JavaScript string methods before queries enter the evaluator.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 In queries where letters must fold,
ASCII reigns, steady and bold,
No ß grows long, no dots disappear,
Matching's now clear, consistent and dear!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The description comprehensively covers the problem, solution, and changes, but does not complete the required template checklist items.	Complete the checklist by confirming local testing with 'pnpm test' and indicating whether a changeset has been generated.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: implementing ASCII case folding for upper/lower/ilike functions.
Docstring Coverage	✅ Passed	Docstring coverage is 83.33% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

packages/db/src/query/compiler/evaluators.ts (1)

17-45: ⚡ Quick win

Deduplicate the ASCII fold loop to prevent divergence.

asciiToUpper and asciiToLower are near-identical; extracting one shared helper will reduce maintenance drift risk.

♻️ Proposed refactor

+function asciiFold(
+  s: string,
+  minCode: number,
+  maxCode: number,
+  delta: number,
+): string {
+  let out = ``
+  for (let i = 0; i < s.length; i++) {
+    const code = s.charCodeAt(i)
+    out +=
+      code >= minCode && code <= maxCode
+        ? String.fromCharCode(code + delta)
+        : s.charAt(i)
+  }
+  return out
+}
+
 function asciiToUpper(s: string): string {
-  let out = ``
-  for (let i = 0; i < s.length; i++) {
-    const c = s.charCodeAt(i)
-    out += c >= 0x61 && c <= 0x7a ? String.fromCharCode(c - 32) : s.charAt(i)
-  }
-  return out
+  return asciiFold(s, 0x61, 0x7a, -32)
 }
@@
 function asciiToLower(s: string): string {
-  let out = ``
-  for (let i = 0; i < s.length; i++) {
-    const c = s.charCodeAt(i)
-    out += c >= 0x41 && c <= 0x5a ? String.fromCharCode(c + 32) : s.charAt(i)
-  }
-  return out
+  return asciiFold(s, 0x41, 0x5a, 32)
 }

As per coding guidelines, “Extract common logic into utility functions when identical or near-identical code blocks appear in multiple places.”

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/db/src/query/compiler/evaluators.ts` around lines 17 - 45,
asciiToUpper and asciiToLower duplicate the same per-character loop; extract a
single helper (e.g., asciiFold or asciiCaseFold) that iterates the string and
applies a transform based on per-character checks and an offset or mapping, then
reimplement asciiToUpper and asciiToLower to call that helper with the
appropriate parameters (upper offset -32, lower offset +32 or a
predicate/mapper). Update calls to use the new helper and keep original function
names and signatures so external callers are unaffected, and ensure behavior for
non-ASCII characters remains unchanged by returning the original character when
outside the A-Z/a-z ranges.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@packages/db/src/query/compiler/evaluators.ts`:
- Around line 17-45: asciiToUpper and asciiToLower duplicate the same
per-character loop; extract a single helper (e.g., asciiFold or asciiCaseFold)
that iterates the string and applies a transform based on per-character checks
and an offset or mapping, then reimplement asciiToUpper and asciiToLower to call
that helper with the appropriate parameters (upper offset -32, lower offset +32
or a predicate/mapper). Update calls to use the new helper and keep original
function names and signatures so external callers are unaffected, and ensure
behavior for non-ASCII characters remains unchanged by returning the original
character when outside the A-Z/a-z ranges.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cf6731c0-df7b-498e-b7f7-5a1df560cc1a

📥 Commits

Reviewing files that changed from the base of the PR and between 66210a5 and d9a2753.

📒 Files selected for processing (3)

.changeset/ascii-case-fold-upper-lower-ilike.md
packages/db/src/query/compiler/evaluators.ts
packages/db/tests/query/compiler/evaluators.test.ts

Use ASCII case fold in upper/lower/ilike to avoid JS locale-aware len…

d9a2753

…gth-changing folds

coderabbitai Bot reviewed Jun 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(db): ASCII case fold in upper/lower/ilike#1574

fix(db): ASCII case fold in upper/lower/ilike#1574
sravan27 wants to merge 1 commit into
TanStack:mainfrom
sravan27:ascii-case-fold-for-upper-lower-ilike

sravan27 commented Jun 7, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sravan27 commented Jun 7, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Out of scope

Context

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 inconclusive)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sravan27 commented Jun 7, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading