Allow running reduce_parent operations on stack allocated parents#7751
Draft
robert3005 wants to merge 1 commit into
Draft
Allow running reduce_parent operations on stack allocated parents#7751robert3005 wants to merge 1 commit into
robert3005 wants to merge 1 commit into
Conversation
3751910 to
08c9f4a
Compare
Merging this PR will degrade performance by 21.45%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_bool_canonical_into[(10, 1000)] |
793.9 µs | 920.3 µs | -13.74% |
| ❌ | Simulation | chunked_bool_canonical_into[(100, 100)] |
102.4 µs | 115.5 µs | -11.32% |
| ❌ | Simulation | chunked_opt_bool_canonical_into[(10, 1000)] |
911.1 µs | 1,038.4 µs | -12.26% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(10, 1000)] |
1.7 ms | 1.9 ms | -10.57% |
| ❌ | Simulation | search_index_above_max |
268.1 µs | 340.3 µs | -21.21% |
| ❌ | Simulation | search_index_below_min |
264.2 µs | 336.4 µs | -21.45% |
| ❌ | Simulation | search_index_full_range_random |
266.2 µs | 338.4 µs | -21.34% |
| ❌ | Simulation | search_index_in_range |
266.5 µs | 338.7 µs | -21.32% |
| ❌ | Simulation | search_index_mixed_out_of_range |
266.3 µs | 338.4 µs | -21.3% |
| ⚡ | Simulation | slice_dict_tight_loop[10000] |
392.2 µs | 328.3 µs | +19.48% |
| ⚡ | Simulation | slice_primitive_tight_loop[10000] |
195.5 µs | 166.2 µs | +17.64% |
| ⚡ | Simulation | take_search[(0.005, 0.05)] |
168.2 µs | 132.1 µs | +27.33% |
| ⚡ | Simulation | take_search[(0.005, 0.1)] |
320.1 µs | 247.6 µs | +29.28% |
| ⚡ | Simulation | take_search[(0.005, 0.5)] |
1.5 ms | 1.2 ms | +30.98% |
| ⚡ | Simulation | take_search[(0.005, 1.0)] |
3.1 ms | 2.3 ms | +31.23% |
| ⚡ | Simulation | take_search[(0.01, 0.05)] |
179.3 µs | 143.2 µs | +25.21% |
| ⚡ | Simulation | take_search[(0.01, 0.1)] |
341 µs | 268.5 µs | +27% |
| ⚡ | Simulation | take_search[(0.01, 0.5)] |
1.6 ms | 1.3 ms | +28.53% |
| ⚡ | Simulation | take_search[(0.01, 1.0)] |
3.3 ms | 2.5 ms | +28.75% |
| ⚡ | Simulation | take_search[(0.1, 0.05)] |
248.9 µs | 212.8 µs | +16.96% |
| ... | ... | ... | ... | ... | ... |
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.
Comparing refactor/parent-ref-stack-allocated (04c7c59) with develop (f3dfa0c)
robert3005
commented
May 1, 2026
0b38af3 to
95b0670
Compare
Contributor
|
Some of those benchmarks are doing extra real work |
Contributor
Author
|
This is not ready yet, there’s still something wrong with matchers |
`ArrayRef::slice/filter/take` previously allocated a wrapper array (`SliceArray`, `FilterArray`, `DictArray`) and called `.optimize()` on the resulting `ArrayRef`, relying on a `reduce_parent` rule to throw the wrapper away. The wrapper allocation was always paid even when reduction ran in one shot. This change establishes the API surface for moving the wrapper onto the stack: - `ParentRef<'a>`: a parent representation that optionally borrows an `&ArrayRef` and otherwise carries the encoding-specific data, dtype, length, slots, and encoding id directly. `into_array_ref(self)` clones the underlying `Arc` when the parent is heap-backed. - `ParentView<'a, V>`: a typed view of a parent that derefs to `V::ArrayData` without holding an `&ArrayRef`. Used by the upcoming matcher path that accepts stack-allocated parents. - `DynArray::data_any` exposes the encoding-specific data so a matcher can downcast to `V::ArrayData` from a `ParentRef` regardless of whether the parent is heap- or stack-backed. - `ArrayParts<V>::optimize`, `optimize_ctx(session)`, and `into_array`, plus `Optimized<V>` with its own `into_array`. Callers chain `parts.optimize()?.into_array()` so reduction is an explicit, orthogonal step from materialization. - `ArrayRef::slice / filter / take` now build an `ArrayParts<V>` and drive it through the chain. The internals of `ArrayParts::optimize` still materialize before running the existing `reduce_parent` chain, so this PR does not yet remove the `Arc<ArrayInner<V>>` allocation. Wiring `ParentRef` through `DynArray::reduce_parent`, `VTable::reduce_parent`, `ParentRuleSet`, `DynArrayParentReduceRule`, `ReduceParentFn`, the `Matcher` API, and the per-encoding rule bodies is the follow-up that delivers the allocation savings. - `cargo build --workspace` - `cargo nextest run -p vortex-array -p vortex-fastlanes -p vortex-fsst -p vortex-alp -p vortex-runend -p vortex-zigzag` (all 3078 pass; the 21 skipped are timezone-dependent and unrelated to this change) - `cargo clippy -p vortex-array --all-targets --all-features` - `cargo +nightly fmt --all -- --check` - `./scripts/public-api.sh` Signed-off-by: Robert Kruszewski <github@robertk.io>
95b0670 to
04c7c59
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add indirection logic to support stack allocated array parents in reduce rules.
This lets us avoid heap allocation for commonly used operations that are often
immediately optimised away