Skip to content

feat: add query/segments/count metric to data nodes#19624

Open
jtuglu1 wants to merge 2 commits into
apache:masterfrom
jtuglu1:add-query-segment-count
Open

feat: add query/segments/count metric to data nodes#19624
jtuglu1 wants to merge 2 commits into
apache:masterfrom
jtuglu1:add-query-segment-count

Conversation

@jtuglu1

@jtuglu1 jtuglu1 commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Description

Allows for quick queries to understand how many segments a given query issued scans for against a given data node. Better than counting the unique query/segment/time or similar, especially if you are sampling higher volume metrics like query/segment/time, but still want to have quick way to know stats about # of segments queried per node.

Release note

Add query/segment/count metric to data nodes.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 2
P3 0
Total 2
Severity Findings
P0 0
P1 0
P2 2
P3 0
Total 2

Reviewed 14 of 14 changed files.


This is an automated review by Codex GPT-5.5

Comment thread server/src/main/java/org/apache/druid/server/ServerManager.java Outdated
Comment thread docs/operations/metrics.md Outdated
@jtuglu1 jtuglu1 force-pushed the add-query-segment-count branch from 78d50df to c273754 Compare June 24, 2026 17:37
@jtuglu1 jtuglu1 added this to the 38.0.0 milestone Jun 24, 2026

@kfaraz kfaraz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left some minor comments.

Comment thread docs/operations/metrics.md Outdated
|Metric|Description|Dimensions|Normal value|
|------|-----------|----------|------------|
|`query/time`|Milliseconds taken to complete a query.|<p>Common: `dataSource`, `type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`, `id`, `statusCode`.</p><p> Aggregation Queries: `numMetrics`, `numComplexMetrics`.</p><p> GroupBy: `numDimensions`.</p><p> TopN: `threshold`, `dimension`.</p>|< 1s|
|`query/segment/count`|Number of segments this Historical scans for a query. This is local to this data node; the Broker metric `query/segments/count` reports the distributed query plan.|<p>Common: `dataSource`, `type`, `interval`, `hasFilters`, `duration`, `context`, `remoteAddress`, `id`.</p><p> Aggregation Queries: `numMetrics`, `numComplexMetrics`.</p><p> GroupBy: `numDimensions`.</p><p> TopN: `threshold`, `dimension`.</p>|Varies|

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, the similar metric names query/segment/count and query/segments/count (with an 'S') can be confusing. I wish the Broker metric name reflected that it was a plan-only metric.

Nit: For the use case in this PR though, I feel that query/segments/count probably makes more sense since query/segment/time reflects time taken to scan a single segment and not the total time.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, do we even need to mention the Broker metric here? It seems unrelated despite the similar names. The disambiguation, if needed, may be done in Broker metrics section itself.

@Nullable
private final String segmentId;
@Nullable
private final Integer localSegmentCount;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the prefix local signify?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local to this data node

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like that is implicit, since the data node is not aware of segments from other nodes anyway.

/**
* This class is responsible for emitting query/segment/time, query/wait/time and query/segmentAndCache/Time metrics for a Sink.
* This class is responsible for emitting query/segment/time, query/wait/time, query/segmentAndCache/Time and
* query/segment/count metrics for a Sink.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also call out that unlike other metrics emitted here, query/segment/count is emitted only when segmentId is non-null, since it is a total count.

Comment on lines +505 to +506
Assert.assertEquals(1, stubServiceEmitter.getMetricEventCount(DefaultQueryMetrics.QUERY_SEGMENT_COUNT));
Assert.assertEquals(2L, stubServiceEmitter.getLatestMetricEventValue(DefaultQueryMetrics.QUERY_SEGMENT_COUNT));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: You may also use stubServiceEmitter.verifyEmitted and verifyValue.

@FrankChen021 FrankChen021 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Severity Findings
P0 0
P1 0
P2 1
P3 0
Total 1

Reviewed 16 of 16 changed files.


This is an automated review by Codex GPT-5.5

);
}

final int localSegmentCount = allSegmentReferences.size();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Count realtime sinks instead of hydrants

This counts allSegmentReferences, but those references are hydrant-level: Sink.acquireSegmentReferences adds one SinkSegmentReference per FireHydrant. After a realtime segment has multiple hydrants, for example after persist/swap or with both persisted and current hydrants, a query over one Druid segment emits query/segment/count as the hydrant count instead of the segment/Sink count. That disagrees with the metric's documented segment-level meaning and with the existing per-segment metrics that aggregate by sinkSegmentId; please count distinct queried sinks/descriptors and add a multi-hydrant test.

/**
* Registers the number of segments scanned by this data node for the query.
*/
default QueryMetrics<QueryType> reportLocalSegmentCount(long segmentCount)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtuglu1 , could you please add some info in the javadoc of the existing reportQueriedSegmentCount() metric to clearly distinguish that method from reportLocalSegmentCount()?

@jtuglu1 jtuglu1 force-pushed the add-query-segment-count branch from c273754 to 37cf734 Compare June 26, 2026 19:20
@jtuglu1 jtuglu1 changed the title feat: add query/segment/count metric to data nodes feat: add query/segments/count metric to data nodes Jun 26, 2026

@kfaraz kfaraz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, left minor non-blocking suggestions.

@jtuglu1 , please avoid force pushing commits after a review has been done as it makes subsequent reviews difficult 🙂

* Registers "segments queried count" metric.
* Registers the {@code query/segments/count} metric, the number of segments touched by the query.
*
* Emitted once per query. The value's meaning depends on the emitting process:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Emitted once per query. The value's meaning depends on the emitting process:
* Emitted once per query. The meaning of the metric depends on the emitting process:

* <li>On a data node (Historical, Peon/realtime), it is the number of segments that node actually scanned
* for the query.</li>
* </ul>
* The two are disambiguated by the emitting service/host, not by the metric name.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Do we really want to overload the definition of this metric?
We may even take this opportunity to use a different metric name for the number of segments targeted by the query plan.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally metric renaming is a pain since we need to break user's metric setups. I feel like they both are measuring the same thing for now, so we can potentially address in a follow-up to rename the broker-side ones.

}
}

if (segmentCount != null) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (segmentCount != null) {
// Emit total segmentCount once per query
if (segmentCount != null) {

Do we also want to add a validation on segmentId == null here to ensure that we don't accidentally call this more than once?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if (segmentCount != null) block is already in the else branch of the (segmentId != null guard) above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants