[GLUTEN-11980][CORE] For decimal-key joins, if one side falls back to Spark, force fallback the other side by beliefer · Pull Request #12000 · apache/gluten

beliefer · 2026-04-28T09:36:53Z

What changes are proposed in this pull request?

This PR proposes to fix a bug. Join operator lost data due to decimal join key if one side is vanilla scan and the other side is native scan.

For each join, if one side's scan (FileSourceScanExec or HiveTableScanExec) cannot be offloaded to the native engine, but the other side. If the join key of this join exists decmal type, the row could not be matched well due to the decimal precision within Spark is different from Velox.

Then we will see many rows cannot be matched and even zero rows could be matched.

Why do Spark and Gluten have different precision processing for decimal types?
Spark prioritizes accuracy consistency and would rather sacrifice some performance for this purpose; Gluten pursues ultimate performance, which may introduce subtle differences in accuracy in certain calculations.
There are a lot of issues: #4652

Fixes #11980

How was this patch tested?

Manual tests in our production environment.
UT.

Was this patch authored or co-authored using generative AI tooling?

'No'.
I just use AI to generate the comments.

github-actions · 2026-04-28T09:37:25Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-04-28T09:48:32Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-04-28T10:23:28Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-04-29T02:45:10Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-04-29T03:46:15Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-04-29T06:15:03Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-04-29T06:19:10Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-04-29T08:32:07Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-04-29T08:47:20Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-04-29T10:29:17Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-04-30T03:17:37Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-04-30T06:04:00Z

Run Gluten Clickhouse CI on x86

github-actions · 2026-04-30T08:49:12Z

Run Gluten Clickhouse CI on x86

jinchengchenghh · 2026-04-30T15:19:49Z

+   * AdaptiveSparkPlanExec is handled by descending into its `initialPlan`; all other non-join nodes
+   * are handled recursively through their children.
+   */
+  private def validateJoin(plan: SparkPlan): Unit =


Could we move this to a separate rule?

I created another rule AddFallbackTagsForJoin like AddFallbackTags and change the code of WithRewrites like:

case class WithRewrites( validator: Validator, rewriteRules: Seq[RewriteSingleNode], offloadRules: Seq[OffloadSingleNode]) extends Rule[SparkPlan] { private val validate = AddFallbackTags(validator) private val validateJoin = AddFallbackTagsForJoin(validator) private val rewrite = RewriteSparkPlanRulesManager(validate, validateJoin, rewriteRules) private val offload = LegacyOffload(offloadRules) override def apply(plan: SparkPlan): SparkPlan = { Seq(rewrite, validate, validateJoin, offload).foldLeft(plan) { case (plan, stage) => stage(plan) } } }

Then we must change the constructor of RewriteSparkPlanRulesManager to accept AddFallbackTagsForJoin and change the logic of RewriteSparkPlanRulesManager.
I think it's not worth to do this, so I merged the code into AddFallbackTags.

jinchengchenghh · 2026-04-30T15:22:47Z

    }
  }
+
+  testGluten(


Why do we need this test here? Looks like FallbackSuite has covered the test. And please update the PR description, this problem because one side join is native while another side fallback, so we need to fallback both side, it does not related to decimal, right?

No. It is related to decimal join key. We do not need fallback the other side while one side falling back if the join key have good equality(e.g. string, int) .

The test FallbackSuite cannot cover all the code path, so we need to add these test cases.

Could you describe more why decimal is different and need to handle differently in PR description?

jinchengchenghh · 2026-05-01T10:21:26Z

+   *
+   * When the join key is a decimal type, a native (Velox) scan and a vanilla Spark scan
+   * ([[FileSourceScanExec]] or `HiveTableScanExec`) may produce different representations of the
+   * same decimal value: the native reader may surface raw uncoerced int128_t values while the


Can we update the native side to support this case?

If one side fallbacks, this side should insert ColumnarToRow, why this representation issue?

The implementation principle of decimal determines their different accuracies, and it is a difficult problem for me to solve now.

We should find the root cause and fix the result mismatch issue in native side, other one in community may fix this issue, please keep this issue open now.

jinchengchenghh · 2026-05-01T10:22:25Z

+   *   the right subtree of the join
+   */
+  private def setFallbackTagForOtherSide(leftChild: SparkPlan, rightChild: SparkPlan): Unit = {
+    val leftHasFallbackScan = hasFallbackScan(leftChild)


Not only the scan fallback cause this issue, after filter, it may also occur?

No matter what, we should keep the two side are offloaded at the same time or not.

… Spark, force fallback the other side # Conflicts: # backends-velox/src/test/scala/org/apache/gluten/execution/FallbackSuite.scala

github-actions · 2026-05-06T04:40:53Z

Run Gluten Clickhouse CI on x86

github-actions Bot added CORE works for Gluten Core VELOX labels Apr 28, 2026

beliefer marked this pull request as draft April 28, 2026 09:37

beliefer force-pushed the 11980 branch from fd4ae8e to 3e20a48 Compare April 28, 2026 09:48

beliefer force-pushed the 11980 branch from 3e20a48 to 437aaf1 Compare April 28, 2026 10:22

beliefer force-pushed the 11980 branch from 437aaf1 to 479421f Compare April 29, 2026 02:44

beliefer force-pushed the 11980 branch from 479421f to a4af2ac Compare April 29, 2026 03:45

beliefer force-pushed the 11980 branch from a4af2ac to 81ff1df Compare April 29, 2026 06:14

beliefer force-pushed the 11980 branch from 81ff1df to d041440 Compare April 29, 2026 06:18

beliefer force-pushed the 11980 branch from d041440 to f223675 Compare April 29, 2026 08:31

beliefer force-pushed the 11980 branch from f223675 to 3d9f64b Compare April 29, 2026 08:46

beliefer force-pushed the 11980 branch from 3d9f64b to e1d9417 Compare April 29, 2026 10:28

beliefer force-pushed the 11980 branch from e1d9417 to e31f8ea Compare April 30, 2026 03:17

beliefer marked this pull request as ready for review April 30, 2026 08:49

jinchengchenghh reviewed Apr 30, 2026

View reviewed changes

beliefer requested a review from jinchengchenghh May 1, 2026 04:43

jinchengchenghh reviewed May 1, 2026

View reviewed changes

beliefer requested a review from jinchengchenghh May 4, 2026 13:12

beliefer added 3 commits May 6, 2026 12:39

[GLUTEN-11980][CORE] For decimal-key joins, if one side falls back to…

4fd5840

… Spark, force fallback the other side # Conflicts: # backends-velox/src/test/scala/org/apache/gluten/execution/FallbackSuite.scala

clear open streams

f080514

Simplify code

f9439f9

beliefer force-pushed the 11980 branch from 3578da0 to f9439f9 Compare May 6, 2026 04:40

Conversation

beliefer commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes are proposed in this pull request?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 29, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

github-actions Bot commented Apr 30, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beliefer May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

beliefer May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jinchengchenghh May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

beliefer commented Apr 28, 2026 •

edited

Loading

beliefer May 1, 2026 •

edited

Loading

beliefer May 1, 2026 •

edited

Loading

jinchengchenghh May 1, 2026 •

edited

Loading