[SPARK-33902][SQL] Support CREATE TABLE LIKE for V2#54809
Open
viirya wants to merge 7 commits intoapache:masterfrom
Open
[SPARK-33902][SQL] Support CREATE TABLE LIKE for V2#54809viirya wants to merge 7 commits intoapache:masterfrom
viirya wants to merge 7 commits intoapache:masterfrom
Conversation
638846e to
6e695fe
Compare
## What changes were proposed in this pull request? Previously, `CREATE TABLE LIKE` was implemented only via `CreateTableLikeCommand`, which bypassed the V2 catalog pipeline entirely. This meant: - 3-part names (catalog.namespace.table) caused a parse error - 2-part names targeting a V2 catalog caused `NoSuchDatabaseException` This PR adds a V2 execution path for `CREATE TABLE LIKE`: - Grammar: change `tableIdentifier` (2-part max) to `identifierReference` (N-part) for both target and source, consistent with all other DDL commands - Parser: emit `CreateTableLike` (new V2 logical plan) instead of `CreateTableLikeCommand` directly - `ResolveCatalogs`: resolve the target `UnresolvedIdentifier` to `ResolvedIdentifier` - `ResolveSessionCatalog`: route back to `CreateTableLikeCommand` when both target and source are V1 tables/views in the session catalog (V1->V1 path) - `DataSourceV2Strategy`: convert `CreateTableLike` to new `CreateTableLikeExec` - `CreateTableLikeExec`: physical exec that copies schema and partitioning from the resolved source `Table` and calls `TableCatalog.createTable()` ## How was this patch tested? - `CreateTableLikeSuite`: new integration tests covering V2 target with V1/V2 source, cross-catalog, views as source, IF NOT EXISTS, property behavior, and V1 fallback regression - `DDLParserSuite`: updated existing `create table like` test to match the new `CreateTableLike` plan shape; added 3-part name parsing test Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add two tests covering the case where the source is a V2 table in a non-session catalog and the target resolves to the session catalog. These exercise the CreateTableLikeExec → V2SessionCatalog path and confirm that schema and partitioning are correctly propagated. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add two tests to CreateTableLikeSuite documenting that pure V2 catalogs (e.g. InMemoryCatalog) accept any provider string without validation, while V2SessionCatalog rejects non-existent providers by delegating to DataSource.lookupDataSource. This is consistent with how CreateTableExec handles the USING clause for other V2 DDL commands. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…CREATE TABLE LIKE Two new tests covering previously untested code paths in CreateTableLikeExec: - Source provider is copied to V2 target as PROP_PROVIDER when no USING override is given, consistent with how CreateTableExec handles other V2 DDL. - CHAR(n)/VARCHAR(n) types declared on a V1 source are preserved in the V2 target via CharVarcharUtils.getRawSchema, not collapsed to StringType. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add inline comment explaining the six reasons withConstraints is intentionally omitted: V1 behavior parity, ForeignKey cross-catalog dangling references, constraint name collision risk, validation status semantics on empty tables, NOT NULL already captured in nullability, and PostgreSQL precedent (INCLUDING CONSTRAINTS opt-in). Also notes the path forward if constraint copying is added in the future. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Clarify that V1 tables (CatalogTable) have no constraint objects at all since CHECK/PRIMARY KEY/UNIQUE/FOREIGN KEY are V2-only concepts added in Spark 4.1.0, rather than saying CreateTableLikeCommand "never copied" them which implies an intentional decision rather than absence of the feature. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ed identifiers After the CREATE TABLE LIKE V2 change, the target and source identifiers in CreateTableLikeCommand are now fully qualified (spark_catalog.default.*) because ResolvedV1Identifier explicitly adds the catalog component via ident.asTableIdentifier.copy(catalog = Some(catalog.name)), and ResolvedV1TableIdentifier returns t.catalogTable.identifier which also includes the catalog. Update the analyzer golden file accordingly. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6e695fe to
6e3053c
Compare
Contributor
|
I'll take a look later today. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Previously,
CREATE TABLE LIKEwas implemented only viaCreateTableLikeCommand, which bypassed the V2 catalog pipeline entirely. This meant:NoSuchDatabaseExceptionThis PR adds a V2 execution path for
CREATE TABLE LIKE:tableIdentifier(2-part max) toidentifierReference(N-part) for both target and source, consistent with all other DDL commandsCreateTableLike(new V2 logical plan) instead ofCreateTableLikeCommanddirectlyResolveCatalogs: resolve the targetUnresolvedIdentifiertoResolvedIdentifierResolveSessionCatalog: route back toCreateTableLikeCommandwhen both target and source are V1 tables/views in the session catalog (V1->V1 path)DataSourceV2Strategy: convertCreateTableLiketo newCreateTableLikeExecCreateTableLikeExec: physical exec that copies schema and partitioning from the resolved sourceTableand callsTableCatalog.createTable()Why are the changes needed?
CREATE TABLE LIKEwas implemented solely viaCreateTableLikeCommand, a V1-only command that bypasses the DataSource V2 analysis pipeline entirely. As a result, it was impossible to useCREATE TABLE LIKEto create a table in a non-session V2 catalog (e.g., testcat.dst): a 2-part name like testcat.dst was misinterpreted as database testcat in the session catalog and threwNoSuchDatabaseException, while a 3-part name like testcat.ns.dst was a parse error because the grammar only accepted 2-part tableIdentifier.This change routes
CREATE TABLE LIKEthrough the standard V2 DDL pipeline so that V2 catalog targets are fully supported, while preserving the existing V1 behavior when both target and source resolve to the session catalog.Does this PR introduce any user-facing change?
Yes.
CREATE TABLE LIKEDDL command supports V2.How was this patch tested?
CreateTableLikeSuite: new integration tests covering V2 target with V1/V2 source, cross-catalog, views as source, IF NOT EXISTS, property behavior, and V1 fallback regression, etc.DDLParserSuite: updated existingcreate table liketest to match the newCreateTableLikeplan shape; added 3-part name parsing testWas this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Sonnet 4.6