Skip to content

[draft] Add warning on result truncation in MSE due to leaf limit#17502

Draft
cypherean wants to merge 3 commits intoapache:masterfrom
cypherean:shreyaa/result_trunc_warn2
Draft

[draft] Add warning on result truncation in MSE due to leaf limit#17502
cypherean wants to merge 3 commits intoapache:masterfrom
cypherean:shreyaa/result_trunc_warn2

Conversation

@cypherean
Copy link
Copy Markdown
Contributor

@cypherean cypherean commented Jan 13, 2026

There are two cases when we add the leaflimitprovenance - when leaf limit is explicitly set in the query(for both mse and mse lite) and leaf limit in MSE lite.

Approach:

  • We compute the effective limit and if the effective limit is capped by leaf limit - we propagate a tag for leaflimitprovenance to the servers
  • When results get truncated in the server (in different operators) we check if the query was tagged with lite cap as the leaf limit provenance, if yes we return a warning
  • The broker aggregator aggregates the emits the warning in results

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 13, 2026

Codecov Report

❌ Patch coverage is 0% with 128 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.98%. Comparing base (5af9015) to head (726212c).
⚠️ Report is 56 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...ry/runtime/plan/server/ServerPlanRequestUtils.java 0.00% 41 Missing ⚠️
.../core/operator/query/SelectionOrderByOperator.java 0.00% 21 Missing ⚠️
.../operator/blocks/results/DistinctResultsBlock.java 0.00% 10 Missing ⚠️
...operator/blocks/results/SelectionResultsBlock.java 0.00% 8 Missing ⚠️
...tor/combine/merger/DistinctResultsBlockMerger.java 0.00% 8 Missing ⚠️
...ombine/merger/SelectionOnlyResultsBlockMerger.java 0.00% 8 Missing ⚠️
...ine/merger/SelectionOrderByResultsBlockMerger.java 0.00% 8 Missing ⚠️
...he/pinot/core/operator/query/DistinctOperator.java 0.00% 8 Missing ⚠️
...common/response/broker/BrokerResponseNativeV2.java 0.00% 5 Missing ⚠️
...a/org/apache/pinot/common/datatable/DataTable.java 0.00% 4 Missing ⚠️
... and 2 more

❗ There is a different number of reports uploaded between BASE (5af9015) and HEAD (726212c). Click for more details.

HEAD has 8 uploads less than BASE
Flag BASE (5af9015) HEAD (726212c)
java-21 5 4
unittests1 2 0
unittests 4 2
temurin 10 8
java-11 5 4
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #17502       +/-   ##
=============================================
- Coverage     63.30%   33.98%   -29.33%     
+ Complexity     1474      773      -701     
=============================================
  Files          3157     3167       +10     
  Lines        188334   189284      +950     
  Branches      28809    28973      +164     
=============================================
- Hits         119217    64319    -54898     
- Misses        59898   119634    +59736     
+ Partials       9219     5331     -3888     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 33.97% <0.00%> (-29.24%) ⬇️
java-21 33.97% <0.00%> (-29.30%) ⬇️
temurin 33.98% <0.00%> (-29.33%) ⬇️
unittests 33.97% <0.00%> (-29.33%) ⬇️
unittests1 ?
unittests2 33.97% <0.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

WORKLOAD_NAME(40, "workloadName", MetadataValueType.STRING),
// Needed so that we can track query id in Netty channel response.
QUERY_ID(41, "queryId", MetadataValueType.STRING);
QUERY_ID(41, "queryId", MetadataValueType.STRING),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datatable is used by v1 engine so we can leave it be

private final QueryContext _queryContext;

private DistinctTable _distinctTable;
private boolean _liteLeafLimitReached;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding this in individual results block will not work.

Instead can you consider tracking this in the InstanceResponseOperator? I'd recommend running a quickstart and setting breakpoints to understand which exact operators are called for the leaf stage in lite mode in the servers.


/**
* Tags queryOptions["leafLimitProvenance"]="LITE_CAP" if either:
* - an explicit multiStageLeafLimit (in request metadata or query options) would tighten the previous LIMIT; or
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is interesting. I suppose you want to track if results were truncated by the "multiStageLeafLimit". This will be useful even for non-lite mode queries. Correct?

Copy link
Copy Markdown
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't follow the scope of this PR. Is there a design for this? Is this only associated with MSE lite mode?

boolean rlsFiltersApplied)
throws QueryException, WebApplicationException {
// Strip any user-supplied internal-only options to prevent spoofing. These will be set programmatically later.
// NOTE: 'leafLimitProvenance' is internal and must not be accepted from client input.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only occurrence of internal-only options ? Can we make it a constant set that can be added to and a function that reads the set and removes it so it can be reused across.

QUERY_ID(41, "queryId", MetadataValueType.STRING);
QUERY_ID(41, "queryId", MetadataValueType.STRING),
LEAF_TRUNCATION_REASON(42, "leafTruncationReason", MetadataValueType.STRING),
LITE_LEAF_CAP_TRUNCATION(43, "liteLeafCapTruncation", MetadataValueType.BOOLEAN);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the limit only meant for MSE Lite ? And is the implementation different from how MSE handles default limits ?

Also, can we keep naming consistent : LITE_LEAF_CAP_TRUNCATION and LITE_LEAF_TRUNCATION_REASON if they're linked together.

@cypherean cypherean marked this pull request as draft January 22, 2026 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants