Skip to content

Conversation

@EeshanBembi
Copy link
Contributor

Which issue does this PR close?

Closes #20194

Rationale for this change

A query with ROW_NUMBER() OVER (... ORDER BY CASE WHEN col='0' THEN 1 ELSE 0 END) combined with a filter nvl(t2.value_2_3,'0')='0' fails with a SanityCheckPlan error. This worked in 50.3.0 but broke in 52.1.0.

What changes are included in this PR?

Root cause: collect_columns_from_predicate_inner was extracting equality pairs where neither side was a Column (e.g. nvl(col, '0') = '0'), creating equivalence classes between complex expressions and literals. normalize_expr's deep traversal would then replace the literal '0' inside unrelated sort/window CASE WHEN expressions with the complex NVL expression, corrupting the sort ordering and causing a mismatch between SortExec's reported output ordering and BoundedWindowAggExec's expected ordering.

Fix (two changes in filter.rs):

  1. collect_columns_from_predicate_inner: Only extract equality pairs where at least one side is a Column reference. This matches the function's documented intent ("Column-Pairs") and prevents complex-expression-to-literal equivalence classes from being created.
  2. extend_constants: Recognize Literal expressions as inherently constant (previously only checked is_expr_constant on the input's equivalence properties, which doesn't know about literals). This ensures constant propagation still works for complex_expr = literal predicates — e.g. nvl(col, '0') is properly marked as constant after the filter.

How was this tested?

  • Unit test test_collect_columns_skips_non_column_pairs verifying the filtering logic
  • Sqllogictest reproducing the exact query from the issue
  • Full test suites: equivalence tests (51 passed), physical-plan tests (1255 passed), physical-optimizer tests (20 passed)
  • Manual verification with datafusion-cli running the reproduction query

Test plan

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) physical-plan Changes to the physical-plan crate labels Feb 9, 2026
…he#20194)

`collect_columns_from_predicate_inner` was extracting equality pairs
where neither side was a Column (e.g. `nvl(col, '0') = '0'`), creating
equivalence classes between complex expressions and literals.
`normalize_expr`'s deep traversal would then replace the literal inside
unrelated sort/window expressions with the complex expression,
corrupting the sort ordering and triggering SanityCheckPlan failures.

Fix by restricting `collect_columns_from_predicate_inner` to only
extract pairs where at least one side is a Column reference, matching
the function's documented intent. Also update `extend_constants` to
recognize Literal expressions as inherently constant, so constant
propagation still works for `complex_expr = literal` predicates.

Closes apache#20194
@EeshanBembi EeshanBembi force-pushed the fix/sanity-check-nvl-equivalence-20194 branch from 5543f1f to 244be39 Compare February 9, 2026 10:57
@EeshanBembi EeshanBembi marked this pull request as ready for review February 9, 2026 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

physical-plan Changes to the physical-plan crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SanityCheckPlan caused by Error during planning:

1 participant