Skip to content

Conversation

@incrypto32
Copy link
Member

No description provided.

@incrypto32 incrypto32 changed the base branch from main to krishna/kafka December 16, 2025 14:24
@incrypto32 incrypto32 force-pushed the krishna/fix-checkpointing branch from 2cfe41b to da48675 Compare January 9, 2026 10:21
@incrypto32 incrypto32 force-pushed the krishna/fix-checkpointing branch from da48675 to 319fe6c Compare January 11, 2026 13:12
Move the pending batch check to BEFORE fetching a new batch from the stream.

Previously, after a reorg was detected:
1. The data batch was stored as _pending_batch
2. A reorg batch was returned
3. On next __next__() call, a NEW batch was fetched BEFORE checking _pending_batch
4. This caused the pending batch to be lost or returned out of order

Now the pending batch check happens first, ensuring proper ordering:
reorg_batch -> pending_data_batch -> next_batch
On stream start, automatically clean up any data written after the last
checkpoint watermark. This handles crash scenarios where data was written
but the checkpoint was not saved.

The _rewind_to_watermark method:
1. Gets the last watermark from state store
2. Creates invalidation ranges for blocks after the watermark
3. Calls _handle_reorg to delete uncommitted data
4. Falls back gracefully if loader does not support deletion

Called automatically at start of load_stream_continuous once per table.
@incrypto32 incrypto32 force-pushed the krishna/fix-checkpointing branch from 319fe6c to 9217d23 Compare January 11, 2026 13:14
@edgeandnode edgeandnode deleted a comment from github-actions bot Jan 11, 2026
@incrypto32 incrypto32 force-pushed the krishna/kafka branch 3 times, most recently from dfc08ed to deaa9d1 Compare January 12, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants