Skip to content

[Bug] Key_Shared (PIP-379): pending-acks cleanup on mark-delete advance can break ordering #25627

@lhotari

Description

@lhotari

Summary

For Key_Shared subscriptions using the PIP-379 implementation (PersistentStickyKeyDispatcherMultipleConsumers), the dispatcher's markDeletePositionMoveForward() hook clears each consumer's PendingAcksMap for any positions at or below the new mark-delete position. This pending-acks cleanup is harmful for Key_Shared and can result in out-of-order message delivery for a sticky key whose hash is currently being drained.

Background

PIP-379 introduced the DrainingHashesTracker to prevent out-of-order delivery during sticky-key hash range reassignment. While consumer A still has unacked messages for hash H in its PendingAcksMap, the new owner of H (consumer B) is blocked from receiving messages with H. The hash refCount is decremented whenever an entry is removed from the PendingAcksMap, via PendingAcksRemoveHandler.handleRemovingDrainingHashesTracker.reduceRefCount.

The handler does not distinguish removal-because-acked from removal-because-mark-delete-advanced. Both paths look identical to the draining-hashes tracker.

What happens

PersistentDispatcherMultipleConsumers.markDeletePositionMoveForward() does two things when the cursor mark-delete position advances (triggered by skip-messages, clear-backlog, expiry, or backlog-quota eviction):

  1. Drops entries from redeliveryMessages for positions ≤ new mark-delete position.
  2. Calls Consumer.removePendingAcksUpToPositionAndDecrementUnacked(...) for every consumer in the dispatcher.

PersistentStickyKeyDispatcherMultipleConsumers invokes super.markDeletePositionMoveForward() and inherits both behaviors.

Step 2 is correct for Shared subscriptions, but for Key_Shared (PIP-379) it removes in-flight pending acks that the draining-hashes mechanism is using to keep a hash blocked. The hash refCount can prematurely drop to 0, the block lifts, and the new owner of that hash starts receiving messages with the same key before the original consumer has processed its already-on-the-wire messages.

Failure scenario

  1. Consumer A receives message M with sticky key K (hash H). M is added to A's PendingAcksMap. M is on the wire.
  2. Consumer B joins; the sticky-key range covering H reassigns from A to B. Hash H is registered as draining for A.
  3. Mark-delete advances past M's position — for example, because backlog-quota eviction (BacklogQuotaManager) calls ManagedCursor.skipEntries, or because message expiry advances the cursor, or an admin issues skip-messages/clear-backlog.
  4. markDeletePositionMoveForward fires. M is removed from A's PendingAcksMap. The remove handler decrements H's refCount, which reaches 0, and the draining block lifts.
  5. A new message with key K is produced. It is delivered to consumer B.
  6. Meanwhile, M is still in flight to consumer A and is processed by A.
  7. The application has now processed two messages with the same key on two different consumers — out-of-order delivery, violating the Key_Shared ordering guarantee.

Trigger surface

Any cursor mark-delete advance that does not go through the consumer-driven ack path can hit this:

  • Backlog-quota eviction (size policy consumer_backlog_eviction and non-precise time policy) — see BacklogQuotaManager.
  • Message expiry — PersistentMessageExpiryMonitor.
  • Admin skipMessages / clearBacklog on a subscription.

Pre-existing, not a regression

This is a latent bug, not introduced by recent PRs:

Neither PR semantically changed the pending-acks cleanup itself; both are correct for Shared subscriptions and faithfully preserve the prior behavior. The bug surfaces because the prior behavior is wrong for PIP-379 Key_Shared — and is now reachable from more paths.

The original review of PR #24096 retained step 2 over concern that draining hashes could otherwise be blocked indefinitely if pending acks weren't pruned on mark-delete advance. That concern needs revising: the pending acks will be cleared naturally when the consumer eventually acks (delivery still completes even if the broker has trimmed storage) or disconnects (PendingAcksMap.forEachAndClose with closing=true).

Impact

  • Severity: Correctness regression for Key_Shared ordering — the central invariant of the subscription type.
  • Trigger conditions: Requires a draining sticky-key hash (consumer set has changed) AND a mark-delete advance that bypasses the per-consumer ack path. Backlog-quota eviction (consumer_backlog_eviction policy), message expiry, and admin skip/clear operations can all reach it.
  • Detectability: Silent. No log, no metric. The application sees out-of-order processing.
  • Visibility: No tests today exercise the interaction between mark-delete advancement and DrainingHashesTracker for Key_Shared.

Scope of fix

Should target the PIP-379 dispatcher (PersistentStickyKeyDispatcherMultipleConsumers) and its parent (PersistentDispatcherMultipleConsumers).

The deprecated PersistentStickyKeyDispatcherMultipleConsumersClassic is out of scope.

The redelivery-tracker cleanup (step 1 above) must be preserved for Key_Shared — it correctly prevents re-reading entries that storage has already discarded.

Related

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions