feat(transaction): Add atomic snapshot tagging to FastAppendAction #1940
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Add the ability to create a tag reference atomically in the same transaction that creates a new snapshot via fast_append.
This adds a
with_tag()andwith_tag_retention()methods toFastAppendActionthat allows specifying a tag name and retention. When set, the tag will be created pointing to the newly created snapshot, all within a single atomic catalog update. The default tag retention isNonewhich should mean it inherits the Table's defaults, if specified.Example usage:
Which issue does this PR close?
What changes are included in this PR?
This change adds a method to the
FastAppendActionthat allows atomic creation of a snapshot reference to a specified tag in the same transaction that creates the snapshot. This enables atomic setting of tags and appending data in a single transaction.Without the atomic transaction guarantees, it is possible for other processes to add new snapshots and cleanup old snapshots (including the one we just created) before a tag can be added to the snapshot, thus protecting it from most automated snapshot expiration policies.
With the atomic guarantees, we can guarantee that either our data was not committed into the table, or it was and it has a tagged snapshot reference that is protected from expiry.
Are these changes tested?
I created a unit test
test_fast_append_with_tagin thesrc/transaction/append.rsmodule thatverifies that there are 3 actions in the
FastAppendActionwhen called with.with_tag("tag"), andthat those 3 actions are a
TableUpdate::AddSnapshotand twoTableUpdate::SetSnapshotRefactions,with one setting the
MAIN_BRANCHtag, and the second setting the tag specified in.with_tag("tag").There is another unit test called
test_fast_append_with_tag_retentionthat validates the tag retention behavior as well.Edit:
I'm looking into why 3 tests from theiceberg-catalog-gluetest set are failing, namelytest_create_table,test_drop_table, andtest_load_table. They're all erroring onThe specified bucket does not existwhich makes me think this is unrelated to my changes.Edit 2: I rebased off the current main and that resolved the failing tests.
Edit 3: I added the
with_tag_retention()method to allow specifying the retention behavior for the tagged snapshot instead of defaulting to the Table defaults.