Skip to content

[improve][broker] Reduce persistent topic backlog check overhead with ManagedCursor.hasBacklog#26058

Open
void-ptr974 wants to merge 4 commits into
apache:masterfrom
void-ptr974:issue-26036-managed-cursor-has-backlog
Open

[improve][broker] Reduce persistent topic backlog check overhead with ManagedCursor.hasBacklog#26058
void-ptr974 wants to merge 4 commits into
apache:masterfrom
void-ptr974:issue-26036-managed-cursor-has-backlog

Conversation

@void-ptr974

@void-ptr974 void-ptr974 commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Motivation

Fixes #26036.

Some persistent topic paths only need to know whether backlog exists, but currently call
getNumberOfEntriesInBacklog(...) and compare the returned count with zero. For precise checks, this
can add unnecessary overhead because the caller does not need the exact backlog count. For imprecise
checks, using a boolean API also better expresses the intent and keeps the call sites consistent.

Modifications

  • Add backward-compatible hasBacklog methods to ManagedCursor, Subscription, and Replicator.
  • Implement ManagedCursorImpl.hasBacklog() to short-circuit once it finds one readable backlog entry.
  • Preserve correctness for individually deleted messages by skipping deleted ranges.
  • Override persistent subscription and replicator hasBacklog methods to delegate to the managed cursor.
  • Replace backlog-count presence checks with hasBacklog(...) in persistent topic, subscription,
    dispatcher, compactor subscription, and replicator paths while preserving the existing precise or
    imprecise mode at each call site.
  • Add managed-ledger tests that compare hasBacklog() / hasBacklog(true) /
    hasBacklog(false) with the existing backlog count APIs across empty, caught-up, mixed ack,
    cross-ledger, batch-index, reset, recovery, deleted-range-to-tail, negative-counter fallback, and
    uninitialized mark-delete states.
  • Add deterministic generated operation coverage for mixed add, delete, mark-delete, reset, and reopen
    sequences.
  • Add broker tests covering real producer/consumer acknowledgement orders, batching, and subscription
    lifecycle operations such as skip, reset, expire, and clear backlog.

Performance impact

This reduces persistent topic backlog check overhead by avoiding exact backlog counting when callers only
need a boolean result. In the common precise-check case, the check can stop at the first valid backlog entry.

For existing imprecise checks, the patch preserves the same imprecise mode and only replaces count-based
presence comparisons with the equivalent boolean API.

Verifications

  • ./gradlew :managed-ledger:test --tests org.apache.bookkeeper.mledger.impl.ManagedCursorTest.hasBacklog*
  • ./gradlew :pulsar-broker:test --tests org.apache.pulsar.broker.service.persistent.PersistentTopicTest.testHasBacklogTracksRealProduceConsumeAckOrders --tests org.apache.pulsar.broker.service.persistent.PersistentTopicTest.testHasBacklogTracksRealSubscriptionLifecycleOperations
  • ./gradlew :managed-ledger:checkstyleMain :managed-ledger:checkstyleTest :pulsar-broker:checkstyleMain :pulsar-broker:checkstyleTest

@void-ptr974 void-ptr974 force-pushed the issue-26036-managed-cursor-has-backlog branch 6 times, most recently from da0836b to d203c4e Compare June 18, 2026 14:23
@void-ptr974 void-ptr974 marked this pull request as ready for review June 18, 2026 14:26
… ManagedCursor.hasBacklog

Motivation:
Issue apache#26036 points out that several broker paths only need to know whether a cursor has any backlog, but they currently call getNumberOfEntriesInBacklog(true), which computes an exact backlog count. For boolean decisions in persistent topic paths, this can add unnecessary overhead.

Modifications:
- Add backward-compatible hasBacklog methods to ManagedCursor, Subscription, and Replicator.
- Implement an optimized precise ManagedCursorImpl.hasBacklog() that skips individually deleted ranges and stops as soon as it finds one readable backlog entry.
- Use hasBacklog in persistent subscription, replicator, topic migration, and backlog checks instead of computing precise counts for yes/no decisions.
- Add managed-ledger tests that compare hasBacklog with the original getNumberOfEntriesInBacklog count semantics across empty, caught-up, mixed ack, cross-ledger, batch-index, reset, recovery, deleted-range-to-tail, negative-counter fallback, and uninitialized mark-delete states.
- Add broker tests for subscription delegation, replicator usage, and migration cleanup behavior.

Performance impact:
This adds a fast existence check for persistent topic backlog paths, avoiding exact backlog counting when callers only need a boolean result.

Fixes apache#26036
@void-ptr974 void-ptr974 force-pushed the issue-26036-managed-cursor-has-backlog branch from d203c4e to e61e5c9 Compare June 19, 2026 04:46

@lhotari lhotari left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good change, one comment

}

@Override
public boolean hasBacklog(boolean isPrecise) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please search for all usages of cursor.getNumberOfEntriesInBacklog(boolean) where the result is compared against 0. There seems to be some more usages that need to be addressed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. I updated the remaining backlog presence checks to use hasBacklog(...) while preserving the existing precise/imprecise modes, and added coverage for the updated paths. Please take another look when you have a chance.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

org.apache.pulsar.broker.service.AbstractReplicator#disconnect seems to have one remaining backlog check based on count.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, missed that one. Changed.

Replace backlog count comparisons with hasBacklog for boolean presence checks and add deterministic coverage for cursor and subscription backlog semantics.
Replace the remaining replicator disconnect backlog presence check with hasBacklog to avoid computing the exact backlog count for a boolean decision.
Remove a constant backlog field from the replicator disconnect log and make the migration backlog check message more descriptive.
@lhotari

lhotari commented Jun 23, 2026

Copy link
Copy Markdown
Member

btw. I did some analysis with Claude Code about cases where there would be individualDeletedMessages that aren't already absorbed into the markDeletePosition.

In the normal ack flow it usually can't, because the delete path is greedy: asyncDelete (ManagedCursorImpl.java:2700-2715) absorbs any deleted range contiguous with the mark-delete
position into MD (newMarkDeletePosition = range.upperEndpoint(), then setAcknowledgedPosition → individualDeletedMessages.removeAtMost(markDeletePosition) at line 2261). That maintains
an invariant that getNextValidPosition(markDeletePosition) is a real, non-deleted entry whenever MD < last — under which mark-delete-only would agree with the walk.

The walk exists to not depend on that invariant:

  • Recovery (recoverIndividualDeletedMessages, line 770) rebuilds the range set verbatim with raw addOpenClosed calls and does not re-run the absorption, including ranges synthesized
    across trimmed/empty ledgers (addOpenClosed(li, -1, …), line 798-802).
  • The directly-constructed test state above is reachable precisely because it bypasses the public delete API.

I guess the backlog check in hasBacklog would short circuit in realistic cases with markDeletePosition.compareTo(lastPosition) >= 0 or the nextPosition.compareTo(lastPosition) > 0. This is good.

@lhotari lhotari left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @void-ptr974

@lhotari lhotari added this to the 5.0.0-M2 milestone Jun 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Optimize code that uses cursor.getNumberOfEntriesInBacklog(true) > 0

2 participants