-
Notifications
You must be signed in to change notification settings - Fork 55
Handle non utf chars #773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle non utf chars #773
Conversation
|
Warning Rate limit exceeded@fogelito has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 25 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (8)
WalkthroughThis PR introduces non-UTF-8 character handling across the database layer. It adds an abstract method to the base Adapter class, implements capability checks in all concrete adapter classes, creates a new Character exception type, and maps specific MySQL/MariaDB character encoding errors to this exception. A sanitization utility is added to the Database class, along with corresponding test coverage. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes
Areas for attention:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (4)
src/Database/Adapter/Pool.php (1)
646-649: Consider renaming to match thegetSupportFor*convention.The method name
getSupportNonUtcCharacters()breaks the established naming pattern. All other capability checks in this class usegetSupportFor*(e.g.,getSupportForSchemas(),getSupportForAttributes(),getSupportForTimeouts()).For consistency, consider renaming to
getSupportForNonUtcCharacters()across the entire codebase (including the abstract definition inAdapter.phpand all concrete implementations).🔎 View suggested naming pattern
- public function getSupportNonUtcCharacters(): bool + public function getSupportForNonUtcCharacters(): bool { return $this->delegate(__FUNCTION__, \func_get_args()); }Note: This change would also require updating the abstract method definition in
src/Database/Adapter.phpand all concrete adapter implementations (MariaDB, SQLite, Mongo, Postgres).src/Database/Adapter/MySQL.php (1)
8-8: MySQL: new CharacterException mapping for error 1366 is sensible but broadThe new import and early branch in
processException()cleanly surface character/encoding issues as a dedicatedCharacterException, which fits the PR goal.Be aware though that MySQL error
1366can also be raised for non-encoding problems (e.g. “Incorrect integer value”), so this check will classify all such cases asCharacterException. If you ever need to narrow this to “incorrect string value” only, you might additionally inspecterrorInfo[2](the vendor message) before mapping.Also applies to: 151-153
src/Database/Adapter.php (1)
1468-1473: Naming nit:getSupportNonUtcCharacterslooks like a typo for UTFThe docblock talks about “non utf characters”, but the method is spelled
getSupportNonUtcCharacters(“Utc” as in time). Since this is a newly introduced abstract, you may want to fix the spelling now (e.g.getSupportNonUtfCharactersorgetSupportForNonUtfCharacters) across adapters before it becomes baked into the public surface.tests/e2e/Adapter/Scopes/DocumentTests.php (1)
40-47: UseassertInstanceOfinstead ofinstanceof+assertTruefor the CharacterException check.Functionally fine, but PHPUnit’s dedicated assertion is clearer and more idiomatic.
Suggested diff
- try { - $database->createDocument(__FUNCTION__, new Document([ - 'title' => $nonUtfString, - ])); - $this->fail('Failed to throw exception'); - } catch (Throwable $e) { - $this->assertTrue($e instanceof CharacterException); - } + try { + $database->createDocument(__FUNCTION__, new Document([ + 'title' => $nonUtfString, + ])); + $this->fail('Failed to throw exception'); + } catch (Throwable $e) { + $this->assertInstanceOf(CharacterException::class, $e); + }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
src/Database/Adapter.php(1 hunks)src/Database/Adapter/MariaDB.php(3 hunks)src/Database/Adapter/Mongo.php(1 hunks)src/Database/Adapter/MySQL.php(2 hunks)src/Database/Adapter/Pool.php(1 hunks)src/Database/Adapter/Postgres.php(1 hunks)src/Database/Adapter/SQLite.php(1 hunks)src/Database/Database.php(1 hunks)src/Database/Exception/Character.php(1 hunks)tests/e2e/Adapter/Base.php(1 hunks)tests/e2e/Adapter/Scopes/DocumentTests.php(2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-16T09:37:33.531Z
Learnt from: fogelito
Repo: utopia-php/database PR: 733
File: src/Database/Adapter/MariaDB.php:1801-1806
Timestamp: 2025-10-16T09:37:33.531Z
Learning: In the MariaDB adapter (src/Database/Adapter/MariaDB.php), only duplicate `_uid` violations should throw `DuplicateException`. All other unique constraint violations, including `PRIMARY` key collisions on the internal `_id` field, should throw `UniqueException`. This is the intended design to distinguish between user-facing document duplicates and internal/user-defined unique constraint violations.
Applied to files:
src/Database/Adapter/MySQL.phpsrc/Database/Exception/Character.phpsrc/Database/Adapter/MariaDB.phptests/e2e/Adapter/Scopes/DocumentTests.php
📚 Learning: 2025-10-03T02:04:17.803Z
Learnt from: abnegate
Repo: utopia-php/database PR: 721
File: tests/e2e/Adapter/Scopes/DocumentTests.php:6418-6439
Timestamp: 2025-10-03T02:04:17.803Z
Learning: In tests/e2e/Adapter/Scopes/DocumentTests::testSchemalessDocumentInvalidInteralAttributeValidation (PHP), when the adapter reports getSupportForAttributes() === false (schemaless), the test should not expect exceptions from createDocuments for “invalid” internal attributes; remove try/catch and ensure the test passes without exceptions, keeping at least one assertion.
Applied to files:
tests/e2e/Adapter/Scopes/DocumentTests.phptests/e2e/Adapter/Base.php
📚 Learning: 2025-10-16T08:48:36.715Z
Learnt from: fogelito
Repo: utopia-php/database PR: 739
File: src/Database/Adapter/Postgres.php:154-158
Timestamp: 2025-10-16T08:48:36.715Z
Learning: For the utopia-php/database repository, no migration scripts are needed for the collation change from utf8_ci to utf8_ci_ai in the Postgres adapter because there is no existing production data.
Applied to files:
src/Database/Adapter/Postgres.php
📚 Learning: 2025-10-29T12:27:57.071Z
Learnt from: ArnabChatterjee20k
Repo: utopia-php/database PR: 747
File: src/Database/Adapter/Mongo.php:1449-1453
Timestamp: 2025-10-29T12:27:57.071Z
Learning: In src/Database/Adapter/Mongo.php, when getSupportForAttributes() returns false (schemaless mode), the updateDocument method intentionally uses a raw document without $set operator for replacement-style updates, as confirmed by the repository maintainer ArnabChatterjee20k.
Applied to files:
src/Database/Adapter/Mongo.php
🧬 Code graph analysis (7)
src/Database/Adapter/MySQL.php (2)
src/Database/Exception.php (1)
Exception(7-21)src/Database/Exception/Character.php (1)
Character(7-9)
src/Database/Adapter/SQLite.php (5)
src/Database/Adapter.php (1)
getSupportNonUtcCharacters(1473-1473)src/Database/Adapter/MariaDB.php (1)
getSupportNonUtcCharacters(2239-2242)src/Database/Adapter/Mongo.php (1)
getSupportNonUtcCharacters(3225-3228)src/Database/Adapter/Pool.php (1)
getSupportNonUtcCharacters(646-649)src/Database/Adapter/Postgres.php (1)
getSupportNonUtcCharacters(2742-2745)
src/Database/Exception/Character.php (1)
src/Database/Exception.php (1)
Exception(7-21)
src/Database/Adapter/Postgres.php (5)
src/Database/Adapter.php (1)
getSupportNonUtcCharacters(1473-1473)src/Database/Adapter/SQLite.php (1)
getSupportNonUtcCharacters(1880-1883)src/Database/Adapter/MariaDB.php (1)
getSupportNonUtcCharacters(2239-2242)src/Database/Adapter/Mongo.php (1)
getSupportNonUtcCharacters(3225-3228)src/Database/Adapter/Pool.php (1)
getSupportNonUtcCharacters(646-649)
src/Database/Adapter/Mongo.php (5)
src/Database/Adapter.php (1)
getSupportNonUtcCharacters(1473-1473)src/Database/Adapter/SQLite.php (1)
getSupportNonUtcCharacters(1880-1883)src/Database/Adapter/MariaDB.php (1)
getSupportNonUtcCharacters(2239-2242)src/Database/Adapter/Pool.php (1)
getSupportNonUtcCharacters(646-649)src/Database/Adapter/Postgres.php (1)
getSupportNonUtcCharacters(2742-2745)
src/Database/Adapter.php (5)
src/Database/Adapter/SQLite.php (1)
getSupportNonUtcCharacters(1880-1883)src/Database/Adapter/MariaDB.php (1)
getSupportNonUtcCharacters(2239-2242)src/Database/Adapter/Mongo.php (1)
getSupportNonUtcCharacters(3225-3228)src/Database/Adapter/Pool.php (1)
getSupportNonUtcCharacters(646-649)src/Database/Adapter/Postgres.php (1)
getSupportNonUtcCharacters(2742-2745)
src/Database/Adapter/Pool.php (6)
src/Database/Adapter.php (1)
getSupportNonUtcCharacters(1473-1473)src/Database/Adapter/SQLite.php (1)
getSupportNonUtcCharacters(1880-1883)src/Database/Adapter/MariaDB.php (1)
getSupportNonUtcCharacters(2239-2242)src/Database/Adapter/Mongo.php (1)
getSupportNonUtcCharacters(3225-3228)src/Database/Adapter/Postgres.php (1)
getSupportNonUtcCharacters(2742-2745)src/Database/Mirror.php (1)
delegate(88-103)
🪛 GitHub Actions: CodeQL
tests/e2e/Adapter/Scopes/DocumentTests.php
[error] 50-50: PHPStan: Parameter #3 $subject of function str_replace expects array|string, string|false given.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Setup & Build Docker Image
🔇 Additional comments (5)
tests/e2e/Adapter/Base.php (1)
26-27: The review comment is based on incorrect code assumptions. Lines 26-27 of tests/e2e/Adapter/Base.php showCollectionTestsandCustomDocumentTypeTestsare currently enabled, not commented out. If these traits were previously disabled and have been re-enabled, verify the reason for their reactivation and ensure all tests pass. If the comment refers to an earlier state, update it to reflect the current code.Likely an incorrect or invalid review comment.
src/Database/Adapter/Mongo.php (1)
3224-3227: Mongo: non-UTF character support flag wired correctlyReturning
falsehere is consistent with the new abstract inAdapterand with the other non-SQL adapters; no issues from this implementation.src/Database/Exception/Character.php (1)
1-9: CharacterException type is well-scoped and consistentThis minimal
Characterexception cleanly reuses the existing baseExceptionbehavior and gives adapters a dedicated type for character-related failures; no changes needed.src/Database/Adapter/MariaDB.php (2)
10-10: MariaDB: CharacterException mapping for 1366 integrates cleanlyImporting
CharacterExceptionand handling22007/1366first inprocessException()gives a clear, adapter-specific signal for invalid character/value errors without disturbing the existing timeout/duplicate/not-found mappings. This looks consistent with how the MySQL adapter is treated.Also applies to: 1842-1845
2239-2242: MariaDB: non-UTF character support flag matches adapter behaviorReturning
truefromgetSupportNonUtcCharacters()is consistent with the new abstract and with the fact that MariaDB now exposes dedicatedCharacterExceptionhandling; no issues here.
Summary by CodeRabbit
New Features
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.