bound the trailing hex read in my_mb_wc_filename by jmestwa-coder · Pull Request #5187 · MariaDB/server

jmestwa-coder · 2026-06-05T14:31:24Z

my_mb_wc_filename decodes the 5-byte '@hhhh' escape but the length guard is off by one:

the s + 4 > e check only covers 4 bytes while the hex branch reads s[4]
a truncated @HHH whose buffer ends at s+4 over-reads one byte past e, reachable through well_formed_length/charpos on a my_charset_filename string with tight bounds
the existing s[3] ? guard only stops at a NUL terminator, not at the end pointer
bound the s[4] read against e; added a strings-t case that returns ILSEQ instead of over-reading (ASAN flags the read of size 1 past a 4-byte buffer before the fix).

gemini-code-assist

Code Review

This pull request fixes an out-of-bounds read vulnerability in my_mb_wc_filename by ensuring that the end pointer is checked before reading the 4th hex digit of a truncated escape sequence. It also adds a unit test to verify this behavior. The review comments suggest improving the unit test by returning the actual return value of my_ci_mb_wc and asserting the expected error code (MY_CS_ILSEQ) directly, which makes the test more idiomatic and direct.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-05T14:32:30Z

+static int test_mb_wc_filename_truncated()
+{
+  CHARSET_INFO *cs= &my_charset_filename;
+  uchar buf[5]= {'@', '0', '0', '0', '1'};
+  my_wc_t wc= 0;
+  int rc= my_ci_mb_wc(cs, &wc, buf, buf + 4);
+  return rc != MY_CS_ILSEQ;
+}


Returning the actual return value of my_ci_mb_wc rather than a boolean mapping makes the test more direct and easier to understand. We can then assert the expected error code (MY_CS_ILSEQ) directly in the test runner.

static int test_mb_wc_filename_truncated() { CHARSET_INFO *cs= &my_charset_filename; uchar buf[5]= {'@', '0', '0', '0', '1'}; my_wc_t wc= 0; return my_ci_mb_wc(cs, &wc, buf, buf + 4); }

gemini-code-assist · 2026-06-05T14:32:30Z

+  ok(test_mb_wc_filename_truncated() == 0,
+     "filename decoder does not read past the end pointer");


Asserting the expected error code (MY_CS_ILSEQ) directly is more idiomatic and provides clearer test output if the assertion fails.

ok(test_mb_wc_filename_truncated() == MY_CS_ILSEQ, "filename decoder does not read past the end pointer");

gkodinov

Thank you for your contribution! This is a preliminary review.

Thank you for working out a test this time!
FWIW, the whole thing is highly theoretical IMHO, since right now most of the file name buffers are of fixed size. I'd be VERY surprised if you find an integral test with this. Have you thought about a possible one?

Otherwise, on the formal side, please add a commit message to your commit.

gkodinov · 2026-06-08T09:13:25Z

  {
    int byte3= hexlo(s[3]);
-    int byte4= hexlo(s[3] ? s[4] : 0);
+    int byte4= hexlo((s[3] && s + 4 < e) ? s[4] : 0);


FWIW, @NNN (3 symbols) is not a valid escape! You could just change the check above to say s + 5 and be done with it (and remove all the ifs and buts).

Yep, @hhh isn't a valid escape. Switched the check above to s + 5 > e (TOOSMALL5) and dropped the per-byte guards, so s[3]/s[4] are plain hexlo() reads now.

@hhhh

The my_charset_filename '@hhhh' escape is 5 bytes, but the length guard only required 4 (s[0..3]) before the hex branch read s[4]. A truncated '@hhh' whose buffer ends exactly at s+4 therefore read one byte past the end pointer, reachable through well_formed_length/charpos on a string with tight bounds. A 4-byte escape is not valid anyway, so require all 5 bytes up front and return MY_CS_TOOSMALL5 when they are not there. That makes the per-byte end-pointer checks on s[3]/s[4] unnecessary, so drop them. Adds a strings-t case that feeds a truncated escape with end= buf+4 and checks for MY_CS_TOOSMALL5; before the fix it read the byte past the end pointer (ASAN: heap-buffer-overflow read of size 1).

jmestwa-coder · 2026-06-08T17:23:51Z

Pushed both points:

guard is now s + 5 > e and the per-byte ifs are gone, per your note
added a commit message body

On a real-world trigger: agreed, it's theoretical. The filename buffers on the server paths are all fixed-size and NUL-terminated, so I couldn't get a SQL-level case to reach s[4] past the end either. The only route that does is well_formed_length/charpos with a tight, non-terminated end pointer, which is what the strings-t case feeds. I don't have an integral test to offer beyond that one, so happy to drop it if you'd rather not carry the unit test.

gemini-code-assist Bot reviewed Jun 5, 2026

View reviewed changes

gkodinov added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Jun 8, 2026

gkodinov requested changes Jun 8, 2026

View reviewed changes

gkodinov self-assigned this Jun 8, 2026

jmestwa-coder force-pushed the filename-mbwc-overread branch from a62f68b to b1bf9b3 Compare June 8, 2026 17:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bound the trailing hex read in my_mb_wc_filename#5187

bound the trailing hex read in my_mb_wc_filename#5187
jmestwa-coder wants to merge 1 commit into
MariaDB:mainfrom
jmestwa-coder:filename-mbwc-overread

jmestwa-coder commented Jun 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Uh oh!

gkodinov left a comment

Uh oh!

gkodinov Jun 8, 2026

Uh oh!

jmestwa-coder Jun 8, 2026

Uh oh!

jmestwa-coder commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants

		ok(test_mb_wc_filename_truncated() == 0,
		"filename decoder does not read past the end pointer");

Uh oh!

Conversation

jmestwa-coder commented Jun 5, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

gkodinov left a comment

Choose a reason for hiding this comment

Uh oh!

gkodinov Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

jmestwa-coder Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

jmestwa-coder commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

2 participants