Skip to content

Spark: Support CREATE TABLE LIKE with Spark#14269

Closed
MaxNevermind wants to merge 20 commits intoapache:mainfrom
MaxNevermind:feature/12936-spark-create-table-like
Closed

Spark: Support CREATE TABLE LIKE with Spark#14269
MaxNevermind wants to merge 20 commits intoapache:mainfrom
MaxNevermind:feature/12936-spark-create-table-like

Conversation

@MaxNevermind
Copy link
Copy Markdown
Contributor

@MaxNevermind MaxNevermind commented Oct 7, 2025

Closes: #12936

To be done:

  • Add similar changes to Spark v3.x modules, I saw a discussion about deprecation of Spark 3.4 so not sure if it is needed for all 3.x versions

@MaxNevermind MaxNevermind changed the title Spark: Support CREATE TABLE LIKE with Spark. Spark: Support CREATE TABLE LIKE with Spark Oct 7, 2025
@github-actions github-actions bot added the spark label Oct 7, 2025
@MaxNevermind
Copy link
Copy Markdown
Contributor Author

Hey @anuragmantri
Can you look up this draft of implementation for some red flags?
I don't contribute to Iceberg too often so not entirely certain in my approach.

Copy link
Copy Markdown
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the change @MaxNevermind !
Its useful to have this and there had been attempts to introducing CREATE TABLE LIKE in spark for DSV2 for a while now

At this point it seems reasonable to have custom analyzer rules and nodes to support this feature like done for views.

Let us know when ready for review, happy to help !

Copy link
Copy Markdown
Contributor

@anuragmantri anuragmantri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @MaxNevermind, I agree with @singhpk234, we probably do not need to wait for this in Spark.

Overall, it looks good, feel free to move it out of draft after implementing the table resolution logic. Also add more test coverage. I will take another look when it's ready.

Welcome to Iceberg community!

case class ResolveTables(spark: SparkSession) extends Rule[LogicalPlan] {

override def apply(plan: LogicalPlan): LogicalPlan = plan resolveOperatorsUp {
case x @ CreateIcebergTableLike(_, _, _, _) => x
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I just realized that entire ResolveTables class might not be needed it as tables are resolved in ExtendedDataSourceV2Strategy.

}

@TestTemplate
public void testCreateTableLike() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the test, but we may beed to cover a more

  • IF NOT EXISTS behavior (tested in test but not validated)
  • TBLPROPERTIES override functionality (not tested at all) - Partition spec preservation (not validated) - Sort order preservation (not validated) - Cross-catalog table copying - Error cases (source table doesn't exist, non-Iceberg source, etc.) - Property merging behavior

Anything else I may have missed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anuragmantri @singhpk234

The current state of tests coverage and some questions:

  • ✅ Partition spec preservation - 2 tests for partitioned an not partitioned tables
  • ✅ IF NOT EXISTS behavior
  • ✅ TBLPROPERTIES - a single test for merge and override
  • ❓ Sort order preservation - Do we want to support Sort order functionality in CREATE TABLE LIKE? I see that Spark's DDL doesn't have it. In Iceberg DDL we also add ordering separately by using ALTER TABLE.
  • ❓ Cross-catalog table copying - Do we want this functionality? I might be wrong but I don't see test for that in other DDL statements, does Iceberg even support cross-catalog DDL?
  • ❓ Error cases (source table doesn't exist, non-Iceberg source, etc.) - I'm not sure what I'm supposed to test for that. Do I have to test that exception will be thrown?

@github-actions github-actions bot added the docs label Oct 13, 2025
@MaxNevermind MaxNevermind marked this pull request as ready for review October 19, 2025 01:12
@MaxNevermind
Copy link
Copy Markdown
Contributor Author

MaxNevermind commented Oct 28, 2025

@anuragmantri @singhpk234
Can you please look at it again?

The current state of tests coverage and some questions:

✅ Partition spec preservation - 2 tests for partitioned an not partitioned tables
✅ IF NOT EXISTS behavior
✅ TBLPROPERTIES - a single test for merge and override
❓ Sort order preservation - Do we want to support Sort order functionality in CREATE TABLE LIKE? I see that Spark's DDL doesn't have it. In Iceberg DDL we also add ordering separately by using ALTER TABLE.
❓ Cross-catalog table copying - Do we want this functionality? I might be wrong but I don't see test for that in other DDL statements, does Iceberg even support cross-catalog DDL?
❓ Error cases (source table doesn't exist, non-Iceberg source, etc.) - I'm not sure what I'm supposed to test for that. Do I have to test that exception will be thrown?

@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Nov 28, 2025
@MaxNevermind
Copy link
Copy Markdown
Contributor Author

@anuragmantri @singhpk234
Can you please look at it again?

@github-actions github-actions bot removed the stale label Dec 1, 2025
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 1, 2026

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 1, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 9, 2026

This pull request has been closed due to lack of activity. This is not a judgement on the merit of the PR in any way. It is just a way of keeping the PR queue manageable. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Jan 9, 2026
…12936-spark-create-table-like

# Conflicts:
#	spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala
@manuzhang manuzhang reopened this Mar 16, 2026
@manuzhang manuzhang requested review from Copilot and removed request for anuragmantri March 16, 2026 04:35
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Spark 4.0 Iceberg SQL extensions support for CREATE TABLE ... LIKE ..., including parsing, logical planning, execution, tests, and documentation updates.

Changes:

  • Extend the Iceberg SQL extensions grammar/parser/AST to recognize CREATE TABLE (IF NOT EXISTS) ... LIKE ... (TBLPROPERTIES ...).
  • Add a logical plan node and Spark planner strategy + exec node to create the new table by copying schema/partitioning and merging properties.
  • Add end-to-end extension tests and update Spark DDL documentation.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
spark/v4.0/spark-extensions/src/test/java/org/apache/iceberg/spark/extensions/TestTables.java Adds coverage for schema/spec copying, IF NOT EXISTS behavior, and property merging.
spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/ExtendedDataSourceV2Strategy.scala Plans the new logical command into a physical exec node.
spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/execution/datasources/v2/CreateV2TableLikeExec.scala Implements table creation based on a source Iceberg table.
spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/CreateIcebergTableLike.scala Introduces a new logical command representing CREATE TABLE LIKE.
spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSqlExtensionsAstBuilder.scala Builds the new logical command from the parsed AST.
spark/v4.0/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/parser/extensions/IcebergSparkSqlExtensionsParser.scala Routes CREATE TABLE ... LIKE ... statements through the extensions parser.
spark/v4.0/spark-extensions/src/main/antlr/org.apache.spark.sql.catalyst.parser.extensions/IcebergSqlExtensions.g4 Adds grammar for CREATE TABLE LIKE + TBLPROPERTIES.
docs/docs/spark-ddl.md Documents CREATE TABLE ... LIKE ... support under Iceberg SQL extensions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +94 to +98
assertThat(table.schema().asStruct())
.as("Should have expected nullable schema")
.isEqualTo(expectedSchema.asStruct());
assertThat(table.spec().fields()).as("Should not be an partitioned").isEmpty();
}
assertThat(table.schema().asStruct())
.as("Should have expected nullable schema")
.isEqualTo(expectedSchema.asStruct());
assertThat(table.spec().fields()).as("Should not be an partitioned").isEmpty();
Comment on lines +24 to +25
import org.apache.iceberg.SortDirection
import org.apache.iceberg.SortOrder
Comment on lines +143 to +145
isCreateTableLike(normalized) || (
normalized.startsWith("alter table") && (
normalized.contains("add partition field") ||
Comment on lines +69 to +70
: CREATE TABLE (IF NOT EXISTS)? multipartIdentifier LIKE multipartIdentifier (TBLPROPERTIES '(' tableProperty (',' tableProperty)* ')')? #createTableLike
| ALTER TABLE multipartIdentifier ADD PARTITION FIELD transform (AS name=identifier)? #addPartitionField
Comment on lines +110 to +111
```sql
CREATE TABLE prod.db.new_table
@anuragmantri
Copy link
Copy Markdown
Contributor

Hi @MaxNevermind, we discussed this PR in the Iceberg - Spark Community Sync but I forgot to update it here in the PR. In general, the community is trying to move away from extensions as they can be fragile and hard to maintain. We should aim to add the syntax in Spark if feasible. As an interim, you could add a procedure to do this same.

@MaxNevermind
Copy link
Copy Markdown
Contributor Author

Hi @anuragmantri
Is it worth working on that then? I'm completely fine with switching to something else if this code will become obsolete very soon. Let me know.

@anuragmantri
Copy link
Copy Markdown
Contributor

Is it worth working on that then?

Looks like there is a recent PR in Spark that adds supports for CREATE TABLE.. LIKE... We should close this PR and wait for that to land in Spark.

@MaxNevermind
Copy link
Copy Markdown
Contributor Author

Closing as recent PR apache/spark#54809 in Spark adds supports for CREATE TABLE.. LIKE..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support CREATE TABLE LIKE with Spark

5 participants