Skip to content

[CELEBORN-2315] Add iterator fully-consumed validation after shuffle write#3672

Open
xumingming wants to merge 1 commit intoapache:mainfrom
xumingming:iterator-fully-consumed-check
Open

[CELEBORN-2315] Add iterator fully-consumed validation after shuffle write#3672
xumingming wants to merge 1 commit intoapache:mainfrom
xumingming:iterator-fully-consumed-check

Conversation

@xumingming
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Adds a post-write safety check to HashBasedShuffleWriter and SortBasedShuffleWriter: after the write loop completes, verify the input iterator was fully consumed. If records remain, kill the task with TaskKilledException. This guards against silent data loss.

Why are the changes needed?

It could give another layer of correctness guarantee.

Does this PR resolve a correctness bug?

Enhance correctness guarantee.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

UT

…write

Adds a post-write safety check to HashBasedShuffleWriter and SortBasedShuffleWriter:
after the write loop completes, verify the input iterator was fully consumed.
If records remain, kill the task with TaskKilledException. This guards against
silent data loss.
@xumingming xumingming force-pushed the iterator-fully-consumed-check branch from 5a50c71 to dbd6473 Compare April 23, 2026 12:33
@xumingming
Copy link
Copy Markdown
Contributor Author

@gauravkm @RexXiong @SteNicholas Could you also take a look at this one?

@xumingming
Copy link
Copy Markdown
Contributor Author

@RexXiong @SteNicholas @gauravkm Gentle ping :)

@afterincomparableyum
Copy link
Copy Markdown
Contributor

i’ll help take a look at this PR over the next couple days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants