Skip to content

GH-49470: [C++][Gandiva] Fix crashes in substring_index and truncate with extreme integer values#49471

Open
dmitry-chirkov-dremio wants to merge 1 commit intoapache:mainfrom
dmitry-chirkov-dremio:fix-gandiva-extreme-int-crashes
Open

GH-49470: [C++][Gandiva] Fix crashes in substring_index and truncate with extreme integer values#49471
dmitry-chirkov-dremio wants to merge 1 commit intoapache:mainfrom
dmitry-chirkov-dremio:fix-gandiva-extreme-int-crashes

Conversation

@dmitry-chirkov-dremio
Copy link
Contributor

@dmitry-chirkov-dremio dmitry-chirkov-dremio commented Mar 9, 2026

Rationale for this change

Two Gandiva functions crash when called with extreme integer parameter values:

  1. substring_index(VARCHAR, VARCHAR, INT) crashes with SIGBUS when count is INT_MIN
  2. truncate(BIGINT, INT) crashes with SIGSEGV when scale is INT_MAX or INT_MIN

What changes are included in this PR?

substring_index fix (gdv_string_function_stubs.cc):

  • Replace abs(cnt) with safe int64_t computation to avoid undefined behavior when cnt == INT_MIN

truncate fix (precompiled/extended_math_ops.cc):

  • Return input unchanged for positive scales (no-op for integers)
  • Return 0 for scales < -38 to prevent out-of-bounds access in GetScaleMultiplier

Are these changes tested?

Yes. Added coverage for INT_MAX/INT_MIN values in gdv_function_stubs_test.cc and extended_math_ops_test.cc.

Are there any user-facing changes?

No.

This PR contains a "Critical Fix". These changes fix crashes caused by:

  • abs(INT_MIN) triggering undefined behavior (integer overflow) in substring_index
  • Out-of-bounds array access in GetScaleMultiplier when truncate receives extreme scale values

@github-actions
Copy link

github-actions bot commented Mar 9, 2026

⚠️ GitHub issue #49470 has been automatically assigned in GitHub to PR creator.


if (static_cast<int32_t>(abs(cnt)) <= static_cast<int32_t>(occ.size()) && cnt > 0) {
// Use int64_t to avoid undefined behavior with abs(INT_MIN)
int64_t abs_cnt = (cnt < 0) ? -static_cast<int64_t>(cnt) : static_cast<int64_t>(cnt);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to check the function parameter for the bad size and exit early like in the truncation fix? That seems simpler.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd leave as is:

  1. The int64_t fix is more robust - it handles ALL negative values correctly, not just INT_MIN
  2. It's the same number of lines - early exit adds 5 lines, current fix adds 3 lines
  3. The current fix also simplifies the existing code - removes redundant static_cast<int32_t> casts

The early-exit approach only guards against the specific crash values, while the int64_t approach fixes the underlying type-safety issue.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants