Skip to content

Shard move in block_writes mode fails with idle_in_transaction_session_timeout on metadata workers#8484

Merged
codeforall merged 1 commit intomainfrom
muusama/idle_in_trans
Mar 2, 2026
Merged

Shard move in block_writes mode fails with idle_in_transaction_session_timeout on metadata workers#8484
codeforall merged 1 commit intomainfrom
muusama/idle_in_trans

Conversation

@codeforall
Copy link
Contributor

Description

When performing a shard move using block_writes transfer mode (either directly via citus_move_shard_placement or through the background rebalancer), the operation can fail with:

   ERROR: terminating connection due to idle-in-transaction timeout
   CONTEXT: while executing command on <worker_host>:<worker_port>

The failing worker is a metadata worker that is neither the source nor the target of the shard move.

Root Cause

LockShardListMetadataOnWorkers() opens coordinated transactions on all metadata workers to acquire advisory shard metadata locks via SELECT lock_shard_metadata(...). These transactions remain open until the entire shard move completes and the coordinated transaction commits.

In block_writes mode, the data copy phase (CopyShardsToNode) runs synchronously between the source and target workers. Metadata workers not involved in the copy have no commands to execute and their connections sit completely idle-in-transaction for the entire duration of the data copy.

For large shards, the copy can take significantly longer than common idle_in_transaction_session_timeout values, When the timeout fires on an uninvolved worker, PostgreSQL terminates the connection, causing the shard move to fail.

This also affects shard splits, since they follow the same code path through LockShardListMetadataOnWorkers.

Fix

LockShardListMetadataOnWorkers() should send SET LOCAL idle_in_transaction_session_timeout = 0 on each metadata worker connection before acquiring the locks. SET LOCAL scopes the change to the current transaction only, so normal sessions on the workers are unaffected.

…tions during shard moves

In block_writes mode, LockShardListMetadataOnWorkers() opens coordinated
transactions on all metadata workers to hold advisory shard metadata locks.
These connections remain open for the entire duration of the shard move,
but workers not involved in the data copy have no commands to execute and
they sit idle-in-transaction until the coordinated transaction commits.

For large shards, the data copy can take hours, easily exceeding common
idle_in_transaction_session_timeout values. When the timeout fires on an
uninvolved worker, PostgreSQL terminates the connection and the move fails.

Fix by sending SET LOCAL idle_in_transaction_session_timeout = 0 on each
metadata worker connection before acquiring locks. SET LOCAL scopes the
change to the current transaction only, so normal sessions are unaffected.
@codecov
Copy link

codecov bot commented Feb 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.88%. Comparing base (546f206) to head (8bfbf39).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8484      +/-   ##
==========================================
- Coverage   88.90%   88.88%   -0.03%     
==========================================
  Files         286      286              
  Lines       63107    63109       +2     
  Branches     7910     7910              
==========================================
- Hits        56108    56093      -15     
- Misses       4734     4757      +23     
+ Partials     2265     2259       -6     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codeforall codeforall marked this pull request as ready for review February 21, 2026 07:44
@emelsimsek
Copy link
Contributor

LGTM.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes shard move / shard split failures in block_writes mode caused by idle_in_transaction_session_timeout terminating idle coordinated transactions on metadata workers during long-running data copy.

Changes:

  • Disable idle_in_transaction_session_timeout (via SET LOCAL ... = 0) on metadata worker connections before acquiring shard metadata locks.
  • Add a regression test that configures a low idle_in_transaction_session_timeout across nodes and performs a block_writes shard move.
  • Update expected regression output accordingly.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File Description
src/backend/distributed/utils/resource_lock.c Prepends a SET LOCAL idle_in_transaction_session_timeout = 0 command before shard metadata lock acquisition on metadata workers.
src/test/regress/sql/shard_move_constraints_blocking.sql Adds a regression test scenario for shard move under low idle-in-transaction timeout.
src/test/regress/expected/shard_move_constraints_blocking.out Captures the expected output for the new regression test section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +246 to +252
CREATE TABLE test_move(id int PRIMARY KEY, val text);
SELECT create_distributed_table('test_move', 'id');
INSERT INTO test_move SELECT i, 'val_' || i FROM generate_series(1, 100) i;

-- move a shard using block_writes; should succeed despite the 1s timeout
SELECT citus_move_shard_placement(8980000, 'localhost', :worker_1_port, 'localhost', :worker_2_port, shard_transfer_mode:='block_writes');
SELECT public.wait_for_resource_cleanup();
Copy link

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test sets idle_in_transaction_session_timeout to 1s, but the shard being moved is tiny (100 rows spread across 4 shards), so the block_writes move will likely finish well under 1s even without the backend fix. That makes the regression non-deterministic / ineffective at catching the original failure mode. Consider making the data-copy phase reliably exceed the timeout (e.g., add a small per-row delay on the target during copy, or otherwise slow the copy in a deterministic way) so the test would fail on older code paths.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Copy link
Contributor

Copilot AI commented Feb 25, 2026

@emelsimsek I've opened a new pull request, #8485, to work on those changes. Once the pull request is ready, I'll request review from you.

@codeforall codeforall merged commit d3330fd into main Mar 2, 2026
163 checks passed
@codeforall codeforall deleted the muusama/idle_in_trans branch March 2, 2026 11:40
codeforall added a commit that referenced this pull request Mar 5, 2026
…n_timeout on metadata workers (#8484)

### Description

When performing a shard move using block_writes transfer mode (either
directly via citus_move_shard_placement or through the background
rebalancer), the operation can fail with:

```
   ERROR: terminating connection due to idle-in-transaction timeout
   CONTEXT: while executing command on <worker_host>:<worker_port>

```
The failing worker is a metadata worker that is neither the source nor
the target of the shard move.

### Root Cause
LockShardListMetadataOnWorkers() opens coordinated transactions on all
metadata workers to acquire advisory shard metadata locks via SELECT
lock_shard_metadata(...). These transactions remain open until the
entire shard move completes and the coordinated transaction commits.

In block_writes mode, the data copy phase (CopyShardsToNode) runs
synchronously between the source and target workers. Metadata workers
not involved in the copy have no commands to execute and their
connections sit completely idle-in-transaction for the entire duration
of the data copy.

For large shards, the copy can take significantly longer than common
idle_in_transaction_session_timeout values, When the timeout fires on an
uninvolved worker, PostgreSQL terminates the connection, causing the
shard move to fail.

This also affects shard splits, since they follow the same code path
through LockShardListMetadataOnWorkers.

### Fix
LockShardListMetadataOnWorkers() should send SET LOCAL
idle_in_transaction_session_timeout = 0 on each metadata worker
connection before acquiring the locks. SET LOCAL scopes the change to
the current transaction only, so normal sessions on the workers are
unaffected.
ihalatci pushed a commit that referenced this pull request Mar 5, 2026
#8491)

…n_timeout on metadata workers (#8484)

### Description

When performing a shard move using block_writes transfer mode (either
directly via citus_move_shard_placement or through the background
rebalancer), the operation can fail with:

```
   ERROR: terminating connection due to idle-in-transaction timeout
   CONTEXT: while executing command on <worker_host>:<worker_port>

```
The failing worker is a metadata worker that is neither the source nor
the target of the shard move.

### Root Cause
LockShardListMetadataOnWorkers() opens coordinated transactions on all
metadata workers to acquire advisory shard metadata locks via SELECT
lock_shard_metadata(...). These transactions remain open until the
entire shard move completes and the coordinated transaction commits.

In block_writes mode, the data copy phase (CopyShardsToNode) runs
synchronously between the source and target workers. Metadata workers
not involved in the copy have no commands to execute and their
connections sit completely idle-in-transaction for the entire duration
of the data copy.

For large shards, the copy can take significantly longer than common
idle_in_transaction_session_timeout values, When the timeout fires on an
uninvolved worker, PostgreSQL terminates the connection, causing the
shard move to fail.

This also affects shard splits, since they follow the same code path
through LockShardListMetadataOnWorkers.

### Fix
LockShardListMetadataOnWorkers() should send SET LOCAL
idle_in_transaction_session_timeout = 0 on each metadata worker
connection before acquiring the locks. SET LOCAL scopes the change to
the current transaction only, so normal sessions on the workers are
unaffected.

DESCRIPTION: PR description that will go into the change log, up to 78
characters
codeforall added a commit that referenced this pull request Mar 6, 2026
…n_timeout on metadata workers (#8484)

### Description

When performing a shard move using block_writes transfer mode (either
directly via citus_move_shard_placement or through the background
rebalancer), the operation can fail with:

```
   ERROR: terminating connection due to idle-in-transaction timeout
   CONTEXT: while executing command on <worker_host>:<worker_port>

```
The failing worker is a metadata worker that is neither the source nor
the target of the shard move.

### Root Cause
LockShardListMetadataOnWorkers() opens coordinated transactions on all
metadata workers to acquire advisory shard metadata locks via SELECT
lock_shard_metadata(...). These transactions remain open until the
entire shard move completes and the coordinated transaction commits.

In block_writes mode, the data copy phase (CopyShardsToNode) runs
synchronously between the source and target workers. Metadata workers
not involved in the copy have no commands to execute and their
connections sit completely idle-in-transaction for the entire duration
of the data copy.

For large shards, the copy can take significantly longer than common
idle_in_transaction_session_timeout values, When the timeout fires on an
uninvolved worker, PostgreSQL terminates the connection, causing the
shard move to fail.

This also affects shard splits, since they follow the same code path
through LockShardListMetadataOnWorkers.

### Fix
LockShardListMetadataOnWorkers() should send SET LOCAL
idle_in_transaction_session_timeout = 0 on each metadata worker
connection before acquiring the locks. SET LOCAL scopes the change to
the current transaction only, so normal sessions on the workers are
unaffected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants