Skip to content

Move tables 1.6 crash safe resume#1708

Merged
zacharysierakowski merged 24 commits into
feature-move-tablesfrom
move-tables-1.6-crash-safe-resume
Jun 19, 2026
Merged

Move tables 1.6 crash safe resume#1708
zacharysierakowski merged 24 commits into
feature-move-tablesfrom
move-tables-1.6-crash-safe-resume

Conversation

@zacharysierakowski

@zacharysierakowski zacharysierakowski commented Jun 12, 2026

Copy link
Copy Markdown

closes https://github.com/github/database-infrastructure/issues/8210

A Pull Request should be associated with an Issue.

We wish to have discussions in Issues. A single issue may be targeted by multiple PRs.
If you're offering a new feature or fixing anything, we'd like to know beforehand in Issues,
and potentially we'll be able to point development in a particular direction.

Related issue (internal): https://github.com/github/database-infrastructure/issues/8210
Related issue (public): #1681

Further notes in https://github.com/github/gh-ost/blob/master/.github/CONTRIBUTING.md
Thank you! We are open to PRs, but please understand if for technical reasons we are unable to accept each and any PR

Description

This PR updates move-tables mode to be crash-resume safe. It manages the checkpoint table life, and a backgrounded checkpoint process as well as introduces new checkpoint data specifically for the move-tables cutover. Per the docs on crash safe resume across cutover, we now store that the move-tables cutover has started (a bool) and the move-tables cutover drain GTID. We use these two attributes for --move-tables --resume to pickup if the process failed mid-cutover. It's safe for failure between T1-T3 (the rename and the drain) and also safe for failure post drain, pre hook call (between T3-T5). I simple --resume will pick back up in the cutover protocol and ensure the rename isn't re-run, the drain is complete, and the complete hook is fired.

In case this PR introduced Go code changes:

  • contributed code is using same conventions as original code
  • script/cibuild returns with no formatting errors, build errors or unit test errors.

Demo

ghost-crash-resume-cutover.mp4

@zacharysierakowski zacharysierakowski added the feature-move-tables PRs that are associated with the new move-tables feature label Jun 12, 2026
@zacharysierakowski zacharysierakowski changed the base branch from womoruyi/move-tables-1.5-cutover to feature-move-tables June 15, 2026 15:20
Comment thread go/cmd/gh-ost/main.go
if !migrationContext.UseGTIDs {
log.Infof("--move-tables requires GTID coordinates for cutover drain checks; enabling --gtid")
migrationContext.UseGTIDs = true
}

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose to not fail the request without these, and just enable them. Could also change it to a Fatal and force the operator to rerun with the required flags. But that feels kinda silly if they are always required.

@zacharysierakowski zacharysierakowski marked this pull request as ready for review June 17, 2026 19:21
Copilot AI review requested due to automatic review settings June 17, 2026 19:21

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes --move-tables crash-resume safe across the cooperative cutover by persisting additional cutover state to the checkpoint table (cutover-started + drain GTID), adding logic to resume mid-cutover, and ensuring move-tables runs with checkpointing + GTID coordinates enabled.

Changes:

  • Extend checkpoint schema + insert query builder to include move-tables cutover resume fields, and read/write them from the target-side checkpoint table in move-tables mode.
  • Add migrator logic to persist cutover checkpoints, drain to a captured GTID with improved “non-DML tail” handling, and resume cutover directly from checkpoint state.
  • Update local move-tables scripts/docs to exercise checkpointing and provide better local test flows.
Show a summary per file
File Description
script/move-tables/teardown Adds additional Docker cleanup steps after removing containers.
script/move-tables/README.md Updates local testing steps/command to include checkpointing and validation workflow.
script/move-tables/insert-source-primary-loop Adds batched insert support for higher write rates during local testing.
go/sql/builder.go Extends checkpoint insert builder to optionally include move-tables cutover columns.
go/sql/builder_test.go Adds coverage for move-tables checkpoint insert query shape.
go/logic/migrator.go Implements move-tables cutover checkpoint persistence, drain logic, and resume-from-cutover-checkpoint flow.
go/logic/migrator_move_tables_cutover_test.go Adds unit test for resuming move-tables cutover when already drained.
go/logic/checkpoint.go Extends Checkpoint struct with move-tables cutover resume fields.
go/logic/applier.go Routes checkpoint storage to the target DB in move-tables mode and reads/writes new move-tables cutover fields.
go/logic/applier_test.go Adds integration tests for move-tables checkpoint persistence and cutover-checkpoint selection.
go/cmd/gh-ost/main.go Automatically enables --checkpoint and --gtid when running --move-tables.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 11/11 changed files
  • Comments generated: 6

Comment thread go/logic/migrator.go
Comment thread go/logic/migrator.go Outdated
Comment thread go/logic/migrator.go
Comment thread script/move-tables/insert-source-primary-loop
Comment thread script/move-tables/README.md Outdated
Comment thread script/move-tables/teardown
zacharysierakowski and others added 4 commits June 17, 2026 15:26
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Comment thread go/logic/applier.go
return chk, nil
}

func (apl *Applier) ReadMoveTablesCutOverCheckpoint() (*Checkpoint, error) {

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[question] why can't we re-use ReadLastCheckpoint for this? the only significant difference I see is that the original fn is L1053-4 which we should able to ignore.

I assume there's something subtle I'm missing 🤔

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we technically could have. There might have been a reason at some point during my commit history but I agree it looks like it would be safe to use ReadLastCheckpoint. The big difference is that ReadMoveTablesCutOverCheckpoint has a where clause for gh_ost_move_tables_cutover_started = 1 and to make sure the drain GTID is set. But the one callsite i see of ReadMoveTablesCutOverCheckpoint does that check in the code as well

gh-ost/go/logic/migrator.go

Lines 1020 to 1024 in f5273fa

cutoverResumeCheckpoint, err := mgtr.applier.ReadMoveTablesCutOverCheckpoint()
if err != nil && !errors.Is(err, ErrNoCheckpointFound) {
return err
}
if cutoverResumeCheckpoint != nil && cutoverResumeCheckpoint.MoveTablesCutOverStarted && cutoverResumeCheckpoint.MoveTablesCutOverDrainGTID != nil && !cutoverResumeCheckpoint.MoveTablesCutOverDrainGTID.IsEmpty() {

@zacharysierakowski zacharysierakowski merged commit 9dad9ef into feature-move-tables Jun 19, 2026
6 checks passed
@chriskirkland chriskirkland deleted the move-tables-1.6-crash-safe-resume branch June 19, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature-move-tables PRs that are associated with the new move-tables feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants