Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions pgcopydb-helpers/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,9 +201,9 @@ Resumes a previously interrupted `pgcopydb clone --follow` migration. Backs up t
~/resume-migration.sh ~/migration_YYYYMMDD-HHMMSS # specify explicitly
```

**Important:** This script intentionally does NOT use `--split-tables-larger-than` with `--resume`. pgcopydb truncates the entire table before checking split parts on resume, which causes data loss.
**Important:** If the original migration used `--split-tables-larger-than`, the resume script passes the same value. This is safe when COPY has already completed (the COPY supervisor doesn't run, so no truncation occurs). If COPY was still in progress when the failure happened, use `--restart` instead — pgcopydb truncates split tables before re-queuing parts on resume, which loses already-copied partitions. Run `~/check-migration-status.sh` to determine whether COPY completed before deciding.

**When to use:** After pgcopydb crashes, the instance reboots, or the migration is interrupted. Do NOT use after a successful migration — use `run-migration.sh` to start fresh.
**When to use:** After pgcopydb crashes, the instance reboots, or the migration is interrupted during indexes, post-data restore, or CDC. If COPY failed mid-flight, use `~/target-clean.sh` + `~/drop-replication-slots.sh` + `~/start-migration-screen.sh` to start fresh instead.

**Requires:** `PGCOPYDB_SOURCE_PGURI`, `PGCOPYDB_TARGET_PGURI`, existing migration directory

Expand Down Expand Up @@ -396,11 +396,11 @@ All scripts use variables at the top that can be adjusted per migration. See [Cl
| `TABLE_JOBS` | 16 | run-migration.sh, resume-migration.sh |
| `INDEX_JOBS` | 12 | run-migration.sh, resume-migration.sh |
| `FILTER_FILE` | ~/filters.ini | run-migration.sh, resume-migration.sh |
| `--split-tables-larger-than` | 50GB | run-migration.sh only (not resume) |
| `--split-tables-larger-than` | 50GB | run-migration.sh, resume-migration.sh |

## Critical Warnings

- **Never use `--split-tables-larger-than` with `--resume`** — pgcopydb truncates the entire table before checking parts, causing data loss.
- **If COPY failed mid-flight, use `--restart` instead of `--resume`** — pgcopydb truncates split tables before re-queuing parts on resume, causing data loss for partially-copied tables. If COPY completed and the failure was in a later phase (indexes, CDC), `--resume` with the same `--split-tables-larger-than` value is safe. Run `~/check-migration-status.sh` to check.
- **Never use `pgcopydb --restart`** without backing up first — it wipes the CDC directory AND SQLite catalogs.
- **Always clean up replication slots** after a migration — unconsumed slots cause WAL accumulation on the source.
- **Verify extension filtering after STEP 1** — check `SELECT COUNT(*) FROM s_depend;` in `filter.db`. If it's 0, extension-owned objects in `public` won't be filtered.
Expand Down
11 changes: 9 additions & 2 deletions pgcopydb-helpers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,7 +215,14 @@ If pgcopydb crashes, the instance reboots, or the migration is interrupted:
~/resume-migration.sh ~/migration_YYYYMMDD-HHMMSS # or specify explicitly
```

This backs up the SQLite catalog before resuming. It uses `--not-consistent` to allow resuming from a mid-transaction state, and intentionally omits `--split-tables-larger-than` because pgcopydb truncates the entire table before checking split parts on resume, which causes data loss.
This backs up the SQLite catalog before resuming and uses `--not-consistent` to allow resuming from a mid-transaction state.

**Choosing between `--resume` and `--restart`:**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a user, I am not sure where to apply --resume and --restart because it wasn't shown in the examples above


- **COPY already completed** (failure was during indexes, post-data restore, or CDC): Use `--resume`. If the original migration used `--split-tables-larger-than`, pass the same value — the COPY phase is skipped entirely so there is no truncation risk.
- **COPY was still in progress** when the failure occurred: Use `--restart` (full restart) instead. pgcopydb truncates split tables before re-queuing parts on resume, which loses data from already-copied partitions.

To check whether COPY completed, run `~/check-migration-status.sh` and look at the copy task progress. If all COPY tasks show as completed with no outstanding jobs, it is safe to `--resume`.

To start completely over, wipe the target and clean up replication:

Expand Down Expand Up @@ -392,7 +399,7 @@ sqlite3 ~/migration_*/schema/filter.db "SELECT COUNT(*) FROM s_depend;"

## Critical Warnings

- **Never use `--split-tables-larger-than` with `--resume`** — pgcopydb truncates the entire table before checking parts, causing data loss.
- **If COPY failed mid-flight, use `--restart` instead of `--resume`** — pgcopydb truncates split tables before re-queuing parts on resume, causing data loss for partially-copied tables. If COPY completed and the failure was in a later phase (indexes, CDC), `--resume` with the same `--split-tables-larger-than` value is safe.
- **Never use `pgcopydb --restart`** without backing up first — it wipes the CDC directory AND SQLite catalogs.
- **Always clean up replication slots** when done — unconsumed slots cause unbounded WAL growth on the source.
- **Verify extension filtering after STEP 1** — if `s_depend` count is 0, extension-owned objects won't be excluded.
19 changes: 15 additions & 4 deletions pgcopydb-helpers/resume-migration.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,15 @@
#
# Resumes a previously interrupted pgcopydb clone --follow migration.
# If no directory is given, uses the most recent ~/migration_* directory.
# Backs up the SQLite catalog before resuming. Does NOT use
# --split-tables-larger-than (unsafe with --resume).
# Backs up the SQLite catalog before resuming.
#
# IMPORTANT: --split-tables-larger-than and --resume
# If the original migration used --split-tables-larger-than, you MUST pass
# the same value here -- pgcopydb validates catalog consistency and will
# refuse to resume without it. This is SAFE if the COPY phase already
# completed (indexes, CDC, etc.). If COPY was still in progress when the
# failure occurred, use --restart instead -- pgcopydb truncates split tables
# before re-queuing parts on resume, which loses already-copied partitions.
#
set -eo pipefail

Expand Down Expand Up @@ -57,8 +64,10 @@ cp "$MIGRATION_DIR/schema/source.db" "$MIGRATION_DIR/schema/source.db.bak.$(date
echo "Migration dir: $MIGRATION_DIR"
echo "=========================================="

# NOTE: Do NOT use --split-tables-larger-than with --resume.
# pgcopydb truncates the entire table before checking parts, causing data loss.
# If the original migration used --split-tables-larger-than, pass the
# same value here. This is safe when COPY is already complete (the COPY
# supervisor won't run, so no truncation occurs). If COPY failed
# mid-flight, use --restart instead of --resume.
/usr/lib/postgresql/17/bin/pgcopydb clone \
--follow \
--plugin wal2json \
Expand All @@ -73,6 +82,8 @@ cp "$MIGRATION_DIR/schema/source.db" "$MIGRATION_DIR/schema/source.db.bak.$(date
--skip-db-properties \
--table-jobs "$TABLE_JOBS" \
--index-jobs "$INDEX_JOBS" \
--split-tables-larger-than 50GB \
--split-max-parts "$TABLE_JOBS" \
--dir "$MIGRATION_DIR"

EXIT_CODE=$?
Expand Down