BE-513: HashQL: Rework dynamic aggregate size estimation#8697
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub.
2 Skipped Deployments
|
PR SummaryMedium Risk Overview Adds Reviewed by Cursor Bugbot for commit 598d416. Bugbot is set up for automated code reviews on this repo. Configure here. |
d52594c to
ae2d3af
Compare
fb7f8ba to
5eeec53
Compare
🤖 Augment PR SummarySummary: Reworks HashQL MIR dynamic aggregate size estimation to be type-aware, fixing incorrect cardinality behavior that previously treated all aggregates like flat collections. Changes:
Why: Ensures composite values (structs/tuples/closures) aren’t mis-modeled as collections, and nested collections contribute their full information content when used inside composites. 🤖 Was this summary useful? React with 👍 or 👎 |
Merging this PR will not alter performance
Comparing Footnotes
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## bm/be-512-hashql-switchint-allow-cross-backend-transitions #8697 +/- ##
==============================================================================================
+ Coverage 66.19% 66.31% +0.11%
==============================================================================================
Files 932 932
Lines 84621 84949 +328
Branches 4461 4468 +7
==============================================================================================
+ Hits 56017 56331 +314
- Misses 28056 28067 +11
- Partials 548 551 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 598d416. Configure here.
| *coefficient = coefficient.saturating_mul(scale); | ||
| } | ||
|
|
||
| Estimate::Affine(result) |
There was a problem hiding this comment.
Empty cardinality treated as unbounded in materialize
Low Severity
In materialize(), the Affine × Constant arm calls cardinality.inclusive_max(), which returns None for both Cardinality::empty() (max = Excluded(0), since 0u32.checked_sub(1) is None) and Cardinality::full() (max = Unbounded). When None is returned, the code assumes unbounded cardinality and returns InformationRange::full(). This is wrong for empty cardinality — the result should be empty/zero, not full. The Constant × Constant arm handles this correctly because saturating_mul_cardinality explicitly checks is_empty() first. A missing cardinality.is_empty() guard before the inclusive_max() check causes the inconsistency.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 598d416. Configure here.
Benchmark results
|
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2002 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1001 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 3314 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 1526 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 2078 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 1033 | Flame Graph |
policy_resolution_medium
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 102 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 51 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 269 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 107 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 133 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 63 | Flame Graph |
policy_resolution_none
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 2 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 8 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 3 | Flame Graph |
policy_resolution_small
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| resolve_policies_for_actor | user: empty, selectivity: high, policies: 52 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: empty, selectivity: medium, policies: 25 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: high, policies: 94 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: seeded, selectivity: medium, policies: 26 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: high, policies: 66 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: low, policies: 1 | Flame Graph | |
| resolve_policies_for_actor | user: system, selectivity: medium, policies: 29 | Flame Graph |
read_scaling_complete
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id;one_depth | 1 entities | Flame Graph | |
| entity_by_id;one_depth | 10 entities | Flame Graph | |
| entity_by_id;one_depth | 25 entities | Flame Graph | |
| entity_by_id;one_depth | 5 entities | Flame Graph | |
| entity_by_id;one_depth | 50 entities | Flame Graph | |
| entity_by_id;two_depth | 1 entities | Flame Graph | |
| entity_by_id;two_depth | 10 entities | Flame Graph | |
| entity_by_id;two_depth | 25 entities | Flame Graph | |
| entity_by_id;two_depth | 5 entities | Flame Graph | |
| entity_by_id;two_depth | 50 entities | Flame Graph | |
| entity_by_id;zero_depth | 1 entities | Flame Graph | |
| entity_by_id;zero_depth | 10 entities | Flame Graph | |
| entity_by_id;zero_depth | 25 entities | Flame Graph | |
| entity_by_id;zero_depth | 5 entities | Flame Graph | |
| entity_by_id;zero_depth | 50 entities | Flame Graph |
read_scaling_linkless
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | 1 entities | Flame Graph | |
| entity_by_id | 10 entities | Flame Graph | |
| entity_by_id | 100 entities | Flame Graph | |
| entity_by_id | 1000 entities | Flame Graph | |
| entity_by_id | 10000 entities | Flame Graph |
representative_read_entity
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/block/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/book/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/building/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/organization/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/page/v/2
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/person/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/playlist/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/song/v/1
|
Flame Graph | |
| entity_by_id | entity type ID: https://blockprotocol.org/@alice/types/entity-type/uk-address/v/1
|
Flame Graph |
representative_read_entity_type
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| get_entity_type_by_id | Account ID: bf5a9ef5-dc3b-43cf-a291-6210c0321eba
|
Flame Graph |
representative_read_multiple_entities
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| entity_by_property | traversal_paths=0 | 0 | |
| entity_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| entity_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=0 | 0 | |
| link_by_source_by_property | traversal_paths=255 | 1,resolve_depths=inherit:1;values:255;properties:255;links:127;link_dests:126;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:0;link_dests:0;type:false | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:0;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:0;properties:2;links:1;link_dests:0;type:true | |
| link_by_source_by_property | traversal_paths=2 | 1,resolve_depths=inherit:0;values:2;properties:2;links:1;link_dests:0;type:true |
scenarios
| Function | Value | Mean | Flame graphs |
|---|---|---|---|
| full_test | query-limited | Flame Graph | |
| full_test | query-unlimited | Flame Graph | |
| linked_queries | query-limited | Flame Graph | |
| linked_queries | query-unlimited | Flame Graph |



🌟 What is the purpose of this PR?
The size estimation analysis previously treated all aggregate kinds (structs, tuples, lists, dicts, closures) identically — summing operand footprints and accumulating cardinality as if every aggregate were a flat collection. This was incorrect: a struct or tuple is a single composite value with cardinality 1, while a list or dict is a true collection whose cardinality equals its element count.
This PR introduces type-aware aggregate footprint evaluation. Structs and tuples now correctly report cardinality 1 with units equal to the sum of their fields' materialized sizes. Lists report per-element units (joined across elements) with cardinality equal to the element count. Dicts compute per-pair units (key + value combined) with cardinality equal to the pair count. Closures combine their function pointer and environment footprints into a single scalar value.
To support this, a
materialize()method is introduced onFootprintthat collapses a footprint's(units, cardinality)pair into a single total information estimate. This is needed when a value with its own cardinality (e.g. a list) is embedded as a field of a composite type — the field's contribution to the parent's units must account for the full information content of the nested value, not just its per-element size.🔍 What does this change?
RValue::Aggregatehandler ineval_rvaluewith a dedicatedeval_rvalue_aggregatemethod that dispatches onAggregateKind.materialize()d footprints of all operands and sets cardinality to 1.saturating_mul_add) and sets cardinality to the pair count.Footprint::materialize()which multiplies units by cardinality to produce a total information estimate, with case-specific handling for constant×constant (exact), affine units×constant cardinality (scale coefficients by cardinality upper bound), and affine×affine (element-wise coefficient multiplication as a linear under-approximation).Footprint::one(units)constructor for footprints with cardinality exactly 1.Estimate::saturating_coeff_mulfor element-wise coefficient multiplication between two estimates.InformationRange::saturating_mul_cardinalityfor range-level multiplication of information by cardinality, saturating to unbounded on overflow.Eval::into_footprintas a consuming counterpart toEval::as_ref.struct_aggregate_sums_operandsandtuple_aggregate_sums_operands, which previously reported cardinality 2 for a two-field struct/tuple; both now correctly report cardinality 1.🛡 What tests cover this?
list_aggregate_per_element_units,dict_aggregate_per_pair_units,tuple_many_fields_cardinality_one, andstruct_materializes_list_parameter, each with corresponding snapshot files.footprint.rscovering all fourmaterialize()branches: scalar identity, constant×constant, affine units×constant cardinality, constant units×affine cardinality, and both-affine same-parameter.range.rscoveringsaturating_mul_cardinality: exact multiplication, identity by 1, empty inputs, unbounded cardinality, and overflow to unbounded.❓ How to test this?
cargo test -p hashql-mirand confirm all tests pass.libs/@local/hashql/mir/tests/ui/pass/size-estimation/to verify the reportedunitsandcardinalityvalues match the expected semantics for each aggregate kind.