Skip to content

Fix correlated project pushdown for SQL federation IN subqueries#38405

Open
ym0506 wants to merge 3 commits intoapache:masterfrom
ym0506:issue_37439
Open

Fix correlated project pushdown for SQL federation IN subqueries#38405
ym0506 wants to merge 3 commits intoapache:masterfrom
ym0506:issue_37439

Conversation

@ym0506
Copy link
Copy Markdown
Contributor

@ym0506 ym0506 commented Mar 12, 2026

Summary

This PR fixes correlated IN subqueries that fail when the subquery projects an outer reference from a non-join subquery.

Root Cause

PushProjectIntoScanRule pushed project expressions into LogicalScan without checking whether the projection contained correlated references.

For queries like:

SELECT o.order_id
FROM t_order o
WHERE o.user_id IN (
    SELECT o.user_id
    FROM t_order_item i
)

the projected expression inside the subquery contains an outer reference. After it was pushed into LogicalScan, the scan conversion path failed because the pushed-down scan tree still contained correlated expressions.

The more complex join case succeeded because it did not go through the same simple project-to-scan pushdown path.

Fix

This change makes PushProjectIntoScanRule skip pushdown when a projected expression contains correlated references.

Specifically:

  • detect RexCorrelVariable
  • detect RexFieldAccess that references a correlated expression
  • recursively inspect nested RexCall operands

Tests

Added regression coverage for:

  • PushProjectIntoScanRuleTest
  • SQLStatementCompilerIT#assertCompileWhenCorrelatedInSubqueryProjectsOuterColumn

Fixes #37439

@strongduanmu
Copy link
Copy Markdown
Member

Hi @ym0506, this pr looks great, can you add an e2e sql test in db_tbl_sql_federation scenario?

@ym0506
Copy link
Copy Markdown
Contributor Author

ym0506 commented Mar 16, 2026

Hi @ym0506, this pr looks great, can you add an e2e sql test in db_tbl_sql_federation scenario?

thank you
Added an e2e SQL regression case in the db_tbl_sql_federation scenario.

The new case covers the correlated non-join IN subquery path that triggered this issue:
SELECT o.order_id FROM t_order o WHERE o.order_id < ? AND o.user_id IN (SELECT o.user_id FROM t_order_item i) ORDER BY order_id

I limited it with order_id < 1010 so the expected result stays small and deterministic.

@ym0506
Copy link
Copy Markdown
Contributor Author

ym0506 commented Mar 16, 2026

I adjusted the e2e regression case to keep the outer side selective while preserving the correlated non-join IN-subquery path.

The new SQL still covers the original issue pattern, but wraps the outer table in a filtered derived table so the outer scan can keep order_id = ? in the pushed-down SQL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[SQL Federation] Correlated IN-subquery fails in a non-join subquery but succeed in a more complex subquery with join.

2 participants