Skip to content

SQL API Extensions: Expose planning APIs and make classes public#38951

Draft
damccorm wants to merge 3 commits into
apache:masterfrom
damccorm:feature/sql-api-extensions-only
Draft

SQL API Extensions: Expose planning APIs and make classes public#38951
damccorm wants to merge 3 commits into
apache:masterfrom
damccorm:feature/sql-api-extensions-only

Conversation

@damccorm

@damccorm damccorm commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

Description

This PR is split from #38866. It focuses on exposing Beam SQL's planning and optimization infrastructure as an extensible API.

Previously, Beam SQL's planning stages (via Calcite) were mostly internal and tightly coupled to executing a full SQL string end-to-end. This PR refactors and exposes these planning stages to allow external orchestration of Beam SQL.

Key Changes

  • Exposed Planning APIs:
    • Added parseLogicalPlan(String query) / parseToRel(...) to BeamSqlEnv and QueryPlanner to allow parsing a SQL query string into a Calcite logical plan (RelNode) without immediately optimizing or executing it.
    • Added convertToBeamRel(RelNode logicalPlan) to allow taking an externally constructed or manipulated Calcite logical plan (RelNode) and converting it into a Beam physical plan (BeamRelNode / PCollection pipeline).
  • Extensibility Improvements:
    • Made BeamCalciteTable constructor public to allow external planners to instantiate it.
    • Made TextTableProvider.RowToCsv class public to allow external integration with text table serialization.
  • Testing:
    • Added a new comprehensive unit test testParseAndConvertHelpers in CalciteQueryPlannerTest.java that specifically exercises these new APIs end-to-end.

Why this is needed

This is a crucial feature for external query engines or orchestrators (such as Spark Connect or custom SQL platforms). They can now use Beam's SQL parser to get a logical plan, perform their own optimizations or integrations, and then hand it back to Beam to generate the final executable pipeline.

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors Beam SQL's planning layer to make it more extensible for external orchestrators like Spark Connect. By exposing core planning APIs and allowing for manual logical plan manipulation, it enables more flexible SQL parsing and physical plan conversion workflows. Additionally, it addresses limitations in parameter handling and Calcite configuration, ensuring better compatibility with external SQL dialects.

Highlights

  • API Exposure: Exposed Beam SQL's planning infrastructure by making key classes public and adding new methods to BeamSqlEnv and QueryPlanner to facilitate external orchestrator integration.
  • Planner Enhancements: Added support for positional query parameters and improved Calcite configuration, including a bridge for 'conformance' properties to better support Spark-SQL syntax.
  • Debugging and Logging: Added informative logging throughout the planning and configuration lifecycle to aid in debugging query optimization and plan generation.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces helper methods to BeamSqlEnv, QueryPlanner, and CalciteQueryPlanner to allow parsing SQL queries into logical RelNode plans and subsequently converting those logical plans into physical BeamRelNode plans. It also exposes several classes and constructors as public and adds logging. The review feedback highlights a critical issue where calling planner.close() in the finally block of parseToRel prematurely invalidates the returned RelNode's state. Additionally, the feedback points out unused and unsafe fields captured from a temporary planner, a regression where query collation is discarded, and a misleading parameter name in QueryPlanner.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@damccorm damccorm marked this pull request as draft June 12, 2026 19:47
…lose, preserve collation, and rename parameter
@damccorm

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the Beam SQL query planning capabilities by exposing methods to parse SQL into logical plans (RelNode) and convert logical plans into physical Beam plans (BeamRelNode) separately. It also adds support for resolving parser conformance from pipeline options. The review feedback highlights critical compilation errors due to undeclared checked exceptions (SqlConversionException) in the newly introduced methods, and suggests defensive checks to prevent potential runtime exceptions during plan optimization.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

@damccorm

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces APIs in BeamSqlEnv and QueryPlanner to parse SQL queries into logical plans (RelNode) and convert logical plans to physical plans (BeamRelNode), while also adding support for resolving SQL parser conformance from pipeline options. The review feedback highlights a compilation error in BeamSqlEnv.parseLogicalPlan due to an unhandled checked exception, points out that CalciteQueryPlanner.convertToBeamRel ignores the QueryParameters argument, and suggests exposing an overloaded convertToBeamRel method in BeamSqlEnv to allow passing query parameters.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +137 to +139
public RelNode parseLogicalPlan(String query) throws ParseException {
return planner.parseToRel(query, QueryParameters.ofNone());
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The planner.parseToRel method throws SqlConversionException, which is a checked exception. Since parseLogicalPlan does not catch or declare SqlConversionException in its throws clause, this will cause a compilation error.

Please update the method signature to declare throws SqlConversionException.

Suggested change
public RelNode parseLogicalPlan(String query) throws ParseException {
return planner.parseToRel(query, QueryParameters.ofNone());
}
public RelNode parseLogicalPlan(String query) throws ParseException, SqlConversionException {
return planner.parseToRel(query, QueryParameters.ofNone());
}

Comment on lines +314 to +317
@Override
public BeamRelNode convertToBeamRel(RelNode relNode, QueryParameters queryParameters) {
return convertToBeamRel(relNode, (RelCollation) null);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The convertToBeamRel(RelNode, QueryParameters) implementation currently ignores the queryParameters argument. If the logical plan contains dynamic/positional parameters, they will not be bound, leading to runtime failures or incorrect physical plans.

Please update the implementation to bind positional parameters using ParameterBinder if they are provided.

  @Override
  public BeamRelNode convertToBeamRel(RelNode relNode, QueryParameters queryParameters) {
    if (queryParameters.getKind() == Kind.POSITIONAL) {
      relNode =
          bindParameters(
              relNode,
              new ParameterBinder(relNode.getCluster().getRexBuilder(), queryParameters));
    }
    return convertToBeamRel(relNode, (RelCollation) null);
  }

Comment on lines +133 to +135
public BeamRelNode convertToBeamRel(RelNode relNode) {
return planner.convertToBeamRel(relNode, QueryParameters.ofNone());
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To allow external callers to pass query parameters when converting an externally constructed or parsed RelNode to a BeamRelNode, please expose an overloaded convertToBeamRel method that accepts QueryParameters.

Suggested change
public BeamRelNode convertToBeamRel(RelNode relNode) {
return planner.convertToBeamRel(relNode, QueryParameters.ofNone());
}
public BeamRelNode convertToBeamRel(RelNode relNode) {
return convertToBeamRel(relNode, QueryParameters.ofNone());
}
public BeamRelNode convertToBeamRel(RelNode relNode, QueryParameters queryParameters) {
return planner.convertToBeamRel(relNode, queryParameters);
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant