Skip to content

[SPARK-56638][SQL] Eliminate legacy error class for TABLESAMPLE fraction validation#55782

Open
XdithyX wants to merge 2 commits into
apache:masterfrom
XdithyX:SPARK-56638
Open

[SPARK-56638][SQL] Eliminate legacy error class for TABLESAMPLE fraction validation#55782
XdithyX wants to merge 2 commits into
apache:masterfrom
XdithyX:SPARK-56638

Conversation

@XdithyX
Copy link
Copy Markdown
Contributor

@XdithyX XdithyX commented May 9, 2026

What changes were proposed in this pull request?

This PR replaces the legacy parser error class used by invalid TABLESAMPLE fractions with a named error class (SPARK-56638).

Specifically, this PR:

  • Adds INVALID_TABLESAMPLE_FRACTION to error-conditions.json
  • Adds QueryParsingErrors.invalidTableSampleFractionError
  • Replaces the generic ParserUtils.validate(...) call in AstBuilder.withSample(...) for TABLESAMPLE fraction validation
  • Updates PlanParserSuite to assert the new error class

The validation logic is unchanged. TABLESAMPLE fractions are still required to be in the [0, 1] interval, allowing the existing rounding epsilon.

PlanParserSuite now covers both existing TABLESAMPLE fraction validation and the TABLESAMPLE SYSTEM fraction validation added by #54972.

Why are the changes needed?

ParserUtils.validate(...) always throws _LEGACY_ERROR_TEMP_0064, which is a generic legacy error class.

For invalid TABLESAMPLE fractions, the parser already knows the specific failure: the computed sampling fraction is outside the allowed [0, 1] interval. Using a named error class makes the error condition explicit and continues the ongoing cleanup of legacy temporary error classes.

This PR does not update ParserUtils.validate(...) itself because that helper is shared by unrelated parser validations. Making it throw INVALID_TABLESAMPLE_FRACTION would incorrectly label non-TABLESAMPLE parser failures as TABLESAMPLE fraction errors.

The new error uses SQLSTATE 22023 because the statement is syntactically valid, but the supplied numeric value is invalid/out of range. This is consistent with existing Spark error conditions for invalid argument or range values, such as INVALID_NUMERIC_LITERAL_RANGE and other 22023 value-validation errors, rather than 42601, which is used for syntax errors.

Does this PR introduce any user-facing change?

Yes, for invalid TABLESAMPLE fractions, the error class changes from _LEGACY_ERROR_TEMP_0064 to INVALID_TABLESAMPLE_FRACTION.

How was this patch tested?

Ran build/sbt 'catalyst / Test / testOnly org.apache.spark.sql.catalyst.parser.PlanParserSuite -- -z "sampled relations"'
build/sbt 'catalyst / Test / testOnly org.apache.spark.sql.catalyst.parser.PlanParserSuite -- -z "TABLESAMPLE SYSTEM - fraction out of range"'
Also regenerated and verified the affected SQL golden files:
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql / Test / testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "tablesample-negative.sql"'
SPARK_GENERATE_GOLDEN_FILES=1 build/sbt 'sql / Test / testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "pipe-operators.sql"'
build/sbt 'sql / Test / testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "tablesample-negative.sql"' 'sql / Test / testOnly org.apache.spark.sql.SQLQueryTestSuite -- -z "pipe-operators.sql"'

Was this patch authored or co-authored using generative AI tooling?

Yes. Generated-by: OpenAI GPT-5.5 Codex

@XdithyX
Copy link
Copy Markdown
Contributor Author

XdithyX commented May 9, 2026

Hi @cloud-fan and @sarutak, could you please take a look? Thanks.
Tagging @HeartSaVioR as well.

@stanyao
Copy link
Copy Markdown
Contributor

stanyao commented May 11, 2026

@XdithyX , thank you for fixing this issue. Please fix this issue after #54972 merges, because this issue was found during this PR. To keep this already big PR focused and fix it everywhere (including the code that is not part of #54972) I opened this Jira item (SPARK-56638) to track a follow up fix.

@XdithyX
Copy link
Copy Markdown
Contributor Author

XdithyX commented May 12, 2026

@stanyao Thanks. I rebased this PR after #54972 merged and updated the new TABLESAMPLE SYSTEM out-of-range test to expect INVALID_TABLESAMPLE_FRACTION.
No other changes were required, I also regenerated and verified the affected SQL golden files.

Can you please have a look when you get a chance?

@XdithyX
Copy link
Copy Markdown
Contributor Author

XdithyX commented May 13, 2026

CI has now passed. Please let me know if anything else is required.
cc: @stanyao

@stanyao
Copy link
Copy Markdown
Contributor

stanyao commented May 15, 2026

The goal of this JIRA item is to deprecate the usage of "_LEGACY_ERROR_TEMP_0064" in TABLESAMPLE code path and likely other code paths that depend on ParserUtils.scala. For example, in test("SPARK-55978: TABLESAMPLE SYSTEM - fraction out of range") in PlanParserSuite.scala. Starting from that call site, there is a call chain of -> AnalysisTest.scala -> AbstractSqlParser.scala -> AstBuilder.scala -> validate() function in ParserUtils.scala that is shared by both the existing Bernoulli sampling and the new System sampling. Fixing this will need code changes to 10+ call sites and 20+ test cases. That is a super set of TABLESAMPLE code path. Fixing them all at once is the best approach. I don't think your PR currently covers the full fix.

@XdithyX
Copy link
Copy Markdown
Contributor Author

XdithyX commented May 15, 2026

@stanyao Thanks for clarifying. I originally scoped this PR to the TABLESAMPLE fraction validation path only, but I understand now that the expected fix is to eliminate _LEGACY_ERROR_TEMP_0064 from all current ParserUtils.validate(...) call sites, including TABLESAMPLE SYSTEM/Bernoulli and the other parser validations that share the helper.

I’ll update the PR to cover the full set of validate() call sites and their tests/golden files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants