Skip to content

Conversation

@wk989898
Copy link
Collaborator

@wk989898 wk989898 commented Dec 24, 2025

What problem does this PR solve?

Issue Number: close #3784

What is changed and how it works?

Index File Write Order: The process of writing the index file has been reordered to occur after the corresponding data file is successfully written, ensuring better consistency in cloud storage.
writeDataFile Signature Update: The writeDataFile function now accepts both the data file path and the index file path as distinct arguments, improving clarity and enabling the reordered write logic.

After this PR, the data file will be written before the index file. If the CDC restarts:

  1. If the index file is written, the data may be duplicated to write a new data file.
  2. If the index file is not written, the data file will be overwritten.

Check List

Tests

  • Unit test
  • Integration test

Questions

Will it cause performance regression or break compatibility?
Do you need to update user documentation, design documentation or monitoring documentation?

Release note

 `None`.

@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 24, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Dec 24, 2025
@wk989898
Copy link
Collaborator Author

/test all

@ti-chi-bot ti-chi-bot bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Dec 24, 2025
@gemini-code-assist
Copy link

Summary of Changes

Hello @wk989898, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the cloud storage writing mechanism by changing the order in which data and index files are persisted. The primary goal is to enhance the consistency of data written to cloud storage by ensuring that the index file, which references a data file, is only created or updated after the data file itself has been successfully written. This prevents potential issues with orphaned or invalid index entries.

Highlights

  • Index File Write Order: The process of writing the index file has been reordered to occur after the corresponding data file is successfully written, ensuring better consistency in cloud storage.
  • writeDataFile Signature Update: The writeDataFile function now accepts both the data file path and the index file path as distinct arguments, improving clarity and enabling the reordered write logic.
  • Enhanced Write Consistency: By ensuring the index file is only written after the data file, the change prevents scenarios where an index might point to a non-existent or incomplete data file in cloud storage.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the file writing logic in the cloud storage sink by changing the order of operations to write the data file before the index file. This is a good improvement for atomicity and reliability, as it prevents a state where an index file points to a non-existent data file. The change also correctly handles errors from writing the index file, which was a bug in the previous implementation. I have one suggestion to further improve error handling by cleaning up orphaned data files on failure, making the process even more robust.

@wk989898
Copy link
Collaborator Author

/test pull-cdc-storage-integration-heavy

@wk989898
Copy link
Collaborator Author

pull-cdc-storage-integration-light

@wk989898 wk989898 changed the title [WIP] sink: modify the write order of the index file and data file in storage writer Dec 24, 2025
@wk989898 wk989898 marked this pull request as ready for review December 24, 2025 13:22
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Dec 24, 2025
zap.Error(err))
}

// then write the data file to external storage.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also update the log

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Dec 25, 2025
@ti-chi-bot ti-chi-bot bot added the lgtm label Dec 29, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 29, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: 3AceShowHand, hongyunyan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:
  • OWNERS [3AceShowHand,hongyunyan]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Dec 29, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Dec 29, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-12-25 05:23:17.120086604 +0000 UTC m=+2314541.933864166: ☑️ agreed by hongyunyan.
  • 2025-12-29 06:58:22.8588123 +0000 UTC m=+2665847.672589871: ☑️ agreed by 3AceShowHand.

@ti-chi-bot ti-chi-bot bot merged commit 85c1e93 into master Dec 29, 2025
19 checks passed
@ti-chi-bot ti-chi-bot bot deleted the wk989898-patch-1 branch December 29, 2025 08:37
lidezhu pushed a commit that referenced this pull request Dec 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

storage consumer lost some dml events

4 participants