Skip to content

Allow running reduce_parent operations on stack allocated parents#7751

Draft
robert3005 wants to merge 1 commit into
developfrom
refactor/parent-ref-stack-allocated
Draft

Allow running reduce_parent operations on stack allocated parents#7751
robert3005 wants to merge 1 commit into
developfrom
refactor/parent-ref-stack-allocated

Conversation

@robert3005
Copy link
Copy Markdown
Contributor

Add indirection logic to support stack allocated array parents in reduce rules.

This lets us avoid heap allocation for commonly used operations that are often
immediately optimised away

@robert3005 robert3005 force-pushed the refactor/parent-ref-stack-allocated branch from 3751910 to 08c9f4a Compare May 1, 2026 14:51
Comment thread vortex-array/src/array/erased.rs Outdated
Comment thread vortex-array/src/matcher.rs Outdated
Comment thread vortex-array/src/array/parent.rs
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 1, 2026

Merging this PR will degrade performance by 21.45%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 26 improved benchmarks
❌ 11 regressed benchmarks
✅ 1171 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation chunked_bool_canonical_into[(10, 1000)] 793.9 µs 920.3 µs -13.74%
Simulation chunked_bool_canonical_into[(100, 100)] 102.4 µs 115.5 µs -11.32%
Simulation chunked_opt_bool_canonical_into[(10, 1000)] 911.1 µs 1,038.4 µs -12.26%
Simulation chunked_varbinview_canonical_into[(10, 1000)] 1.7 ms 1.9 ms -10.57%
Simulation search_index_above_max 268.1 µs 340.3 µs -21.21%
Simulation search_index_below_min 264.2 µs 336.4 µs -21.45%
Simulation search_index_full_range_random 266.2 µs 338.4 µs -21.34%
Simulation search_index_in_range 266.5 µs 338.7 µs -21.32%
Simulation search_index_mixed_out_of_range 266.3 µs 338.4 µs -21.3%
Simulation slice_dict_tight_loop[10000] 392.2 µs 328.3 µs +19.48%
Simulation slice_primitive_tight_loop[10000] 195.5 µs 166.2 µs +17.64%
Simulation take_search[(0.005, 0.05)] 168.2 µs 132.1 µs +27.33%
Simulation take_search[(0.005, 0.1)] 320.1 µs 247.6 µs +29.28%
Simulation take_search[(0.005, 0.5)] 1.5 ms 1.2 ms +30.98%
Simulation take_search[(0.005, 1.0)] 3.1 ms 2.3 ms +31.23%
Simulation take_search[(0.01, 0.05)] 179.3 µs 143.2 µs +25.21%
Simulation take_search[(0.01, 0.1)] 341 µs 268.5 µs +27%
Simulation take_search[(0.01, 0.5)] 1.6 ms 1.3 ms +28.53%
Simulation take_search[(0.01, 1.0)] 3.3 ms 2.5 ms +28.75%
Simulation take_search[(0.1, 0.05)] 248.9 µs 212.8 µs +16.96%
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing refactor/parent-ref-stack-allocated (04c7c59) with develop (f3dfa0c)

Open in CodSpeed

Comment thread vortex-array/src/array/typed.rs Outdated
Comment thread encodings/fastlanes/src/bitpacking/vtable/operations.rs
@joseph-isaacs joseph-isaacs added the action/benchmark Trigger full benchmarks to run on this PR label May 1, 2026
@robert3005 robert3005 added changelog/break A breaking API change changelog/performance A performance improvement and removed action/benchmark Trigger full benchmarks to run on this PR changelog/performance A performance improvement labels May 1, 2026
Comment thread vortex-array/src/array/typed.rs Outdated
@robert3005 robert3005 force-pushed the refactor/parent-ref-stack-allocated branch 2 times, most recently from 0b38af3 to 95b0670 Compare May 10, 2026 19:28
@joseph-isaacs
Copy link
Copy Markdown
Contributor

Some of those benchmarks are doing extra real work

@robert3005
Copy link
Copy Markdown
Contributor Author

robert3005 commented May 11, 2026

This is not ready yet, there’s still something wrong with matchers

`ArrayRef::slice/filter/take` previously allocated a wrapper array
(`SliceArray`, `FilterArray`, `DictArray`) and called `.optimize()` on
the resulting `ArrayRef`, relying on a `reduce_parent` rule to throw the
wrapper away. The wrapper allocation was always paid even when reduction
ran in one shot.

This change establishes the API surface for moving the wrapper onto the
stack:

- `ParentRef<'a>`: a parent representation that optionally borrows an
  `&ArrayRef` and otherwise carries the encoding-specific data, dtype,
  length, slots, and encoding id directly. `into_array_ref(self)` clones
  the underlying `Arc` when the parent is heap-backed.
- `ParentView<'a, V>`: a typed view of a parent that derefs to
  `V::ArrayData` without holding an `&ArrayRef`. Used by the upcoming
  matcher path that accepts stack-allocated parents.
- `DynArray::data_any` exposes the encoding-specific data so a matcher
  can downcast to `V::ArrayData` from a `ParentRef` regardless of
  whether the parent is heap- or stack-backed.
- `ArrayParts<V>::optimize`, `optimize_ctx(session)`, and `into_array`,
  plus `Optimized<V>` with its own `into_array`. Callers chain
  `parts.optimize()?.into_array()` so reduction is an explicit,
  orthogonal step from materialization.
- `ArrayRef::slice / filter / take` now build an `ArrayParts<V>` and
  drive it through the chain.

The internals of `ArrayParts::optimize` still materialize before
running the existing `reduce_parent` chain, so this PR does not yet
remove the `Arc<ArrayInner<V>>` allocation. Wiring `ParentRef` through
`DynArray::reduce_parent`, `VTable::reduce_parent`, `ParentRuleSet`,
`DynArrayParentReduceRule`, `ReduceParentFn`, the `Matcher` API, and
the per-encoding rule bodies is the follow-up that delivers the
allocation savings.

- `cargo build --workspace`
- `cargo nextest run -p vortex-array -p vortex-fastlanes -p vortex-fsst -p vortex-alp -p vortex-runend -p vortex-zigzag` (all 3078 pass; the 21 skipped are timezone-dependent and unrelated to this change)
- `cargo clippy -p vortex-array --all-targets --all-features`
- `cargo +nightly fmt --all -- --check`
- `./scripts/public-api.sh`

Signed-off-by: Robert Kruszewski <github@robertk.io>
@robert3005 robert3005 force-pushed the refactor/parent-ref-stack-allocated branch from 95b0670 to 04c7c59 Compare May 11, 2026 09:44
@robert3005 robert3005 marked this pull request as draft May 11, 2026 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/break A breaking API change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants