Skip to content

DAOS-18785 object: handle resent RPC on DTX non-leader - b26#17992

Draft
Nasf-Fan wants to merge 1 commit intorelease/2.6from
Nasf-Fan/DAOS-18785_b26
Draft

DAOS-18785 object: handle resent RPC on DTX non-leader - b26#17992
Nasf-Fan wants to merge 1 commit intorelease/2.6from
Nasf-Fan/DAOS-18785_b26

Conversation

@Nasf-Fan
Copy link
Copy Markdown
Contributor

@Nasf-Fan Nasf-Fan commented Apr 13, 2026

Usually, most of resent RPCs will be detected and handled on DTX leader. But when DTX leader is switched, such as old DTX leader is dead/evicted, the DTX for some inflight IO maybe in 'prepared' status on a non-leader while related client resends the RPC to new DTX leader. Under such case, DTX-resync may has not handled such DTX in time. Then IO handler on the non-leader needs to check whether related DTX has ever been prepared or not: if yes, directly reply to the DTX leader to avoid misguiding lower layer logic as to generate confused error.

Add new test case for that.

Allow-unstable-test: true

Steps for the author:

  • Commit message follows the guidelines.
  • Appropriate Features or Test-tag pragmas were used.
  • Appropriate Functional Test Stages were run.
  • At least two positive code reviews including at least one code owner from each category referenced in the PR.
  • Testing is complete. If necessary, forced-landing label added and a reason added in a comment.

After all prior steps are complete:

  • Gatekeeper requested (daos-gatekeeper added as a reviewer).

@github-actions
Copy link
Copy Markdown

Ticket title is 'test_ec_multiple_rank_failure failed during IOR: dfs_write(0x558292ef2000, 2048) failed (5): Input/output error'
Status is 'In Progress'
https://daosio.atlassian.net/browse/DAOS-18785

@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-18785_b26 branch from 3e3afef to 686fb9a Compare April 14, 2026 02:33
Usually, most of resent RPCs will be detected and handled on DTX leader.
But when DTX leader is switched, such as old DTX leader is dead/evicted,
the DTX for some inflight IO maybe in 'prepared' status on a non-leader
while related client resends the RPC to new DTX leader. Under such case,
DTX-resync may has not handled such DTX in time. Then IO handler on the
non-leader needs to check whether related DTX has ever been prepared or
not: if yes, directly reply to the DTX leader to avoid misguiding lower
layer logic as to generate confused error.

Add new test case for that.

Allow-unstable-test: true

Signed-off-by: Fan Yong <fan.yong@hpe.com>
@Nasf-Fan Nasf-Fan force-pushed the Nasf-Fan/DAOS-18785_b26 branch from 686fb9a to ea31fd3 Compare April 14, 2026 14:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant