Spark: Support CREATE TABLE LIKE with Spark#14269
Spark: Support CREATE TABLE LIKE with Spark#14269MaxNevermind wants to merge 20 commits intoapache:mainfrom
Conversation
…12936-spark-create-table-like
|
Hey @anuragmantri |
singhpk234
left a comment
There was a problem hiding this comment.
Thank you for the change @MaxNevermind !
Its useful to have this and there had been attempts to introducing CREATE TABLE LIKE in spark for DSV2 for a while now
At this point it seems reasonable to have custom analyzer rules and nodes to support this feature like done for views.
Let us know when ready for review, happy to help !
anuragmantri
left a comment
There was a problem hiding this comment.
Thanks for adding this @MaxNevermind, I agree with @singhpk234, we probably do not need to wait for this in Spark.
Overall, it looks good, feel free to move it out of draft after implementing the table resolution logic. Also add more test coverage. I will take another look when it's ready.
Welcome to Iceberg community!
| case class ResolveTables(spark: SparkSession) extends Rule[LogicalPlan] { | ||
|
|
||
| override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp { | ||
| case x @ CreateIcebergTableLike(_, _, _, _) => x |
There was a problem hiding this comment.
This should resolve the catalogs and tables. See https://github.com/apache/iceberg/blob/main/spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/ResolveViews.scala
There was a problem hiding this comment.
Actually I just realized that entire ResolveTables class might not be needed it as tables are resolved in ExtendedDataSourceV2Strategy.
| } | ||
|
|
||
| @TestTemplate | ||
| public void testCreateTableLike() { |
There was a problem hiding this comment.
Thanks for adding the test, but we may beed to cover a more
- IF NOT EXISTS behavior (tested in test but not validated)
- TBLPROPERTIES override functionality (not tested at all) - Partition spec preservation (not validated) - Sort order preservation (not validated) - Cross-catalog table copying - Error cases (source table doesn't exist, non-Iceberg source, etc.) - Property merging behavior
Anything else I may have missed.
There was a problem hiding this comment.
The current state of tests coverage and some questions:
- ✅ Partition spec preservation - 2 tests for partitioned an not partitioned tables
- ✅ IF NOT EXISTS behavior
- ✅ TBLPROPERTIES - a single test for merge and override
- ❓ Sort order preservation - Do we want to support Sort order functionality in CREATE TABLE LIKE? I see that Spark's DDL doesn't have it. In Iceberg DDL we also add ordering separately by using ALTER TABLE.
- ❓ Cross-catalog table copying - Do we want this functionality? I might be wrong but I don't see test for that in other DDL statements, does Iceberg even support cross-catalog DDL?
- ❓ Error cases (source table doesn't exist, non-Iceberg source, etc.) - I'm not sure what I'm supposed to test for that. Do I have to test that exception will be thrown?
|
@anuragmantri @singhpk234 The current state of tests coverage and some questions: ✅ Partition spec preservation - 2 tests for partitioned an not partitioned tables |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
@anuragmantri @singhpk234 |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time. |
…12936-spark-create-table-like # Conflicts: # spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala
There was a problem hiding this comment.
Pull request overview
Adds Spark 4.0 Iceberg SQL extensions support for CREATE TABLE ... LIKE ..., including parsing, logical planning, execution, tests, and documentation updates.
Changes:
- Extend the Iceberg SQL extensions grammar/parser/AST to recognize
CREATE TABLE (IF NOT EXISTS) ... LIKE ... (TBLPROPERTIES ...). - Add a logical plan node and Spark planner strategy + exec node to create the new table by copying schema/partitioning and merging properties.
- Add end-to-end extension tests and update Spark DDL documentation.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestTables.java | Adds coverage for schema/spec copying, IF NOT EXISTS behavior, and property merging. |
| spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.scala | Plans the new logical command into a physical exec node. |
| spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateV2TableLikeExec.scala | Implements table creation based on a source Iceberg table. |
| spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/CreateIcebergTableLike.scala | Introduces a new logical command representing CREATE TABLE LIKE. |
| spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala | Builds the new logical command from the parsed AST. |
| spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala | Routes CREATE TABLE ... LIKE ... statements through the extensions parser. |
| spark/v4.0/spark-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4 | Adds grammar for CREATE TABLE LIKE + TBLPROPERTIES. |
| docs/docs/spark-ddl.md | Documents CREATE TABLE ... LIKE ... support under Iceberg SQL extensions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assertThat(table.schema().asStruct()) | ||
| .as("Should have expected nullable schema") | ||
| .isEqualTo(expectedSchema.asStruct()); | ||
| assertThat(table.spec().fields()).as("Should not be an partitioned").isEmpty(); | ||
| } |
| assertThat(table.schema().asStruct()) | ||
| .as("Should have expected nullable schema") | ||
| .isEqualTo(expectedSchema.asStruct()); | ||
| assertThat(table.spec().fields()).as("Should not be an partitioned").isEmpty(); |
| import org.apache.iceberg.SortDirection | ||
| import org.apache.iceberg.SortOrder |
| isCreateTableLike(normalized) || ( | ||
| normalized.startsWith("alter table") && ( | ||
| normalized.contains("add partition field") || |
| : CREATE TABLE (IF NOT EXISTS)? multipartIdentifier LIKE multipartIdentifier (TBLPROPERTIES '(' tableProperty (',' tableProperty)* ')')? #createTableLike | ||
| | ALTER TABLE multipartIdentifier ADD PARTITION FIELD transform (AS name=identifier)? #addPartitionField |
| ```sql | ||
| CREATE TABLE prod.db.new_table |
|
Hi @MaxNevermind, we discussed this PR in the Iceberg - Spark Community Sync but I forgot to update it here in the PR. In general, the community is trying to move away from extensions as they can be fragile and hard to maintain. We should aim to add the syntax in Spark if feasible. As an interim, you could add a procedure to do this same. |
|
Hi @anuragmantri |
Looks like there is a recent PR in Spark that adds supports for |
|
Closing as recent PR apache/spark#54809 in Spark adds supports for CREATE TABLE.. LIKE.. |
Closes: #12936
To be done: