Shard move in block_writes mode fails with idle_in_transaction_session_timeout on metadata workers by codeforall · Pull Request #8484 · citusdata/citus

codeforall · 2026-02-21T07:21:09Z

Description

When performing a shard move using block_writes transfer mode (either directly via citus_move_shard_placement or through the background rebalancer), the operation can fail with:

   ERROR: terminating connection due to idle-in-transaction timeout
   CONTEXT: while executing command on <worker_host>:<worker_port>

The failing worker is a metadata worker that is neither the source nor the target of the shard move.

Root Cause

LockShardListMetadataOnWorkers() opens coordinated transactions on all metadata workers to acquire advisory shard metadata locks via SELECT lock_shard_metadata(...). These transactions remain open until the entire shard move completes and the coordinated transaction commits.

In block_writes mode, the data copy phase (CopyShardsToNode) runs synchronously between the source and target workers. Metadata workers not involved in the copy have no commands to execute and their connections sit completely idle-in-transaction for the entire duration of the data copy.

For large shards, the copy can take significantly longer than common idle_in_transaction_session_timeout values, When the timeout fires on an uninvolved worker, PostgreSQL terminates the connection, causing the shard move to fail.

This also affects shard splits, since they follow the same code path through LockShardListMetadataOnWorkers.

Fix

LockShardListMetadataOnWorkers() should send SET LOCAL idle_in_transaction_session_timeout = 0 on each metadata worker connection before acquiring the locks. SET LOCAL scopes the change to the current transaction only, so normal sessions on the workers are unaffected.

…tions during shard moves In block_writes mode, LockShardListMetadataOnWorkers() opens coordinated transactions on all metadata workers to hold advisory shard metadata locks. These connections remain open for the entire duration of the shard move, but workers not involved in the data copy have no commands to execute and they sit idle-in-transaction until the coordinated transaction commits. For large shards, the data copy can take hours, easily exceeding common idle_in_transaction_session_timeout values. When the timeout fires on an uninvolved worker, PostgreSQL terminates the connection and the move fails. Fix by sending SET LOCAL idle_in_transaction_session_timeout = 0 on each metadata worker connection before acquiring locks. SET LOCAL scopes the change to the current transaction only, so normal sessions are unaffected.

codecov · 2026-02-21T07:29:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.88%. Comparing base (546f206) to head (8bfbf39).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #8484      +/-   ##
==========================================
- Coverage   88.90%   88.88%   -0.03%     
==========================================
  Files         286      286              
  Lines       63107    63109       +2     
  Branches     7910     7910              
==========================================
- Hits        56108    56093      -15     
- Misses       4734     4757      +23     
+ Partials     2265     2259       -6

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

emelsimsek · 2026-02-25T12:59:19Z

LGTM.

Copilot

Pull request overview

Fixes shard move / shard split failures in block_writes mode caused by idle_in_transaction_session_timeout terminating idle coordinated transactions on metadata workers during long-running data copy.

Changes:

Disable idle_in_transaction_session_timeout (via SET LOCAL ... = 0) on metadata worker connections before acquiring shard metadata locks.
Add a regression test that configures a low idle_in_transaction_session_timeout across nodes and performs a block_writes shard move.
Update expected regression output accordingly.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`src/backend/distributed/utils/resource_lock.c`	Prepends a `SET LOCAL idle_in_transaction_session_timeout = 0` command before shard metadata lock acquisition on metadata workers.
`src/test/regress/sql/shard_move_constraints_blocking.sql`	Adds a regression test scenario for shard move under low idle-in-transaction timeout.
`src/test/regress/expected/shard_move_constraints_blocking.out`	Captures the expected output for the new regression test section.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-25T13:06:54Z

src/test/regress/sql/shard_move_constraints_blocking.sql

+CREATE TABLE test_move(id int PRIMARY KEY, val text);
+SELECT create_distributed_table('test_move', 'id');
+INSERT INTO test_move SELECT i, 'val_' || i FROM generate_series(1, 100) i;
+
+-- move a shard using block_writes; should succeed despite the 1s timeout
+SELECT citus_move_shard_placement(8980000, 'localhost', :worker_1_port, 'localhost', :worker_2_port, shard_transfer_mode:='block_writes');
+SELECT public.wait_for_resource_cleanup();


This test sets idle_in_transaction_session_timeout to 1s, but the shard being moved is tiny (100 rows spread across 4 shards), so the block_writes move will likely finish well under 1s even without the backend fix. That makes the regression non-deterministic / ineffective at catching the original failure mode. Consider making the data-copy phase reliably exceed the timeout (e.g., add a small per-row delay on the target during copy, or otherwise slow the copy in a deterministic way) so the test would fail on older code paths.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-02-25T13:43:17Z

@emelsimsek I've opened a new pull request, #8485, to work on those changes. Once the pull request is ready, I'll request review from you.

…n_timeout on metadata workers (#8484) ### Description When performing a shard move using block_writes transfer mode (either directly via citus_move_shard_placement or through the background rebalancer), the operation can fail with: ``` ERROR: terminating connection due to idle-in-transaction timeout CONTEXT: while executing command on <worker_host>:<worker_port> ``` The failing worker is a metadata worker that is neither the source nor the target of the shard move. ### Root Cause LockShardListMetadataOnWorkers() opens coordinated transactions on all metadata workers to acquire advisory shard metadata locks via SELECT lock_shard_metadata(...). These transactions remain open until the entire shard move completes and the coordinated transaction commits. In block_writes mode, the data copy phase (CopyShardsToNode) runs synchronously between the source and target workers. Metadata workers not involved in the copy have no commands to execute and their connections sit completely idle-in-transaction for the entire duration of the data copy. For large shards, the copy can take significantly longer than common idle_in_transaction_session_timeout values, When the timeout fires on an uninvolved worker, PostgreSQL terminates the connection, causing the shard move to fail. This also affects shard splits, since they follow the same code path through LockShardListMetadataOnWorkers. ### Fix LockShardListMetadataOnWorkers() should send SET LOCAL idle_in_transaction_session_timeout = 0 on each metadata worker connection before acquiring the locks. SET LOCAL scopes the change to the current transaction only, so normal sessions on the workers are unaffected.

#8491) …n_timeout on metadata workers (#8484) ### Description When performing a shard move using block_writes transfer mode (either directly via citus_move_shard_placement or through the background rebalancer), the operation can fail with: ``` ERROR: terminating connection due to idle-in-transaction timeout CONTEXT: while executing command on <worker_host>:<worker_port> ``` The failing worker is a metadata worker that is neither the source nor the target of the shard move. ### Root Cause LockShardListMetadataOnWorkers() opens coordinated transactions on all metadata workers to acquire advisory shard metadata locks via SELECT lock_shard_metadata(...). These transactions remain open until the entire shard move completes and the coordinated transaction commits. In block_writes mode, the data copy phase (CopyShardsToNode) runs synchronously between the source and target workers. Metadata workers not involved in the copy have no commands to execute and their connections sit completely idle-in-transaction for the entire duration of the data copy. For large shards, the copy can take significantly longer than common idle_in_transaction_session_timeout values, When the timeout fires on an uninvolved worker, PostgreSQL terminates the connection, causing the shard move to fail. This also affects shard splits, since they follow the same code path through LockShardListMetadataOnWorkers. ### Fix LockShardListMetadataOnWorkers() should send SET LOCAL idle_in_transaction_session_timeout = 0 on each metadata worker connection before acquiring the locks. SET LOCAL scopes the change to the current transaction only, so normal sessions on the workers are unaffected. DESCRIPTION: PR description that will go into the change log, up to 78 characters

…n_timeout on metadata workers (#8484) ### Description When performing a shard move using block_writes transfer mode (either directly via citus_move_shard_placement or through the background rebalancer), the operation can fail with: ``` ERROR: terminating connection due to idle-in-transaction timeout CONTEXT: while executing command on <worker_host>:<worker_port> ``` The failing worker is a metadata worker that is neither the source nor the target of the shard move. ### Root Cause LockShardListMetadataOnWorkers() opens coordinated transactions on all metadata workers to acquire advisory shard metadata locks via SELECT lock_shard_metadata(...). These transactions remain open until the entire shard move completes and the coordinated transaction commits. In block_writes mode, the data copy phase (CopyShardsToNode) runs synchronously between the source and target workers. Metadata workers not involved in the copy have no commands to execute and their connections sit completely idle-in-transaction for the entire duration of the data copy. For large shards, the copy can take significantly longer than common idle_in_transaction_session_timeout values, When the timeout fires on an uninvolved worker, PostgreSQL terminates the connection, causing the shard move to fail. This also affects shard splits, since they follow the same code path through LockShardListMetadataOnWorkers. ### Fix LockShardListMetadataOnWorkers() should send SET LOCAL idle_in_transaction_session_timeout = 0 on each metadata worker connection before acquiring the locks. SET LOCAL scopes the change to the current transaction only, so normal sessions on the workers are unaffected.

codeforall marked this pull request as ready for review February 21, 2026 07:44

emelsimsek requested review from Copilot and emelsimsek February 25, 2026 12:59

Copilot started reviewing on behalf of emelsimsek February 25, 2026 13:00 View session

emelsimsek approved these changes Feb 25, 2026

View reviewed changes

Copilot AI reviewed Feb 25, 2026

View reviewed changes

Copilot AI mentioned this pull request Feb 25, 2026

Make idle_in_transaction regression test deterministic via per-row COPY delay #8485

Draft

codeforall merged commit d3330fd into main Mar 2, 2026
163 checks passed

codeforall deleted the muusama/idle_in_trans branch March 2, 2026 11:40

codeforall mentioned this pull request Mar 5, 2026

Shard move in block_writes mode fails with idle_in_transaction_sessio… #8491

Merged

codeforall mentioned this pull request Mar 6, 2026

Shard move in block_writes mode fails with idle_in_transaction_sessio… #8495

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shard move in block_writes mode fails with idle_in_transaction_session_timeout on metadata workers#8484

Shard move in block_writes mode fails with idle_in_transaction_session_timeout on metadata workers#8484
codeforall merged 1 commit intomainfrom
muusama/idle_in_trans

codeforall commented Feb 21, 2026

Uh oh!

codecov bot commented Feb 21, 2026 •

edited

Loading

Uh oh!

emelsimsek commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 25, 2026

Uh oh!

emelsimsek Feb 25, 2026

Uh oh!

Copilot AI commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

codeforall commented Feb 21, 2026

Description

Root Cause

Fix

Uh oh!

codecov bot commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

emelsimsek commented Feb 25, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

emelsimsek Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov bot commented Feb 21, 2026 •

edited

Loading