Skip to content

more

a8488fc
Select commit
Loading
Failed to load commit list.
Draft

Improve intersect_by_rank performance #7744

more
a8488fc
Select commit
Loading
Failed to load commit list.
CodSpeed HQ / CodSpeed Performance Analysis succeeded May 6, 2026 in 0s

Performance Gate Passed

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 25 improved benchmarks
✅ 1181 untouched benchmarks

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime cuda/bitpacked_u8/unpack/3bw[100M] 352.3 µs 300.4 µs +17.24%
Simulation density_matrix[(0.05, 0.05, "self_sparse_mask_sparse")] 83 µs 47.6 µs +74.43%
Simulation density_matrix[(0.5, 0.05, "self_dense_mask_sparse")] 482.5 µs 53.2 µs ×9.1
Simulation intersect_by_rank[(10000, "random")] 103.6 µs 10.3 µs ×10
Simulation intersect_by_rank[(100000, "random")] 979.4 µs 53 µs ×18
Simulation density_matrix[(0.05, 0.5, "self_sparse_mask_dense")] 131.8 µs 47.6 µs ×2.8
Simulation density_matrix[(0.5, 0.5, "self_dense_mask_dense")] 979 µs 52.8 µs ×19
Simulation intersect_by_rank[(10000, "runs")] 103.6 µs 10.1 µs ×10
Simulation intersect_by_rank[(100000, "runs")] 976.8 µs 53 µs ×18
Simulation rank_indices[(0.05, 0.05, "self_sparse_rank_sparse")] 80.9 µs 43 µs +87.87%
Simulation rank_indices[(0.5, 0.01, "self_dense_rank_very_sparse")] 427.9 µs 58.8 µs ×7.3
Simulation rank_indices[(0.5, 0.5, "self_dense_rank_dense")] 867.5 µs 53.4 µs ×16
Simulation sparse[(100000, 0.05, "sparse_5pct")] 132.1 µs 47.8 µs ×2.8
Simulation sparse[(100000, 0.5, "dense_50pct")] 979.7 µs 53.2 µs ×18
Simulation very_sparse_mask_cached[(0.5, 0.005, "self_dense_mask_0p5pct")] 422.3 µs 50.6 µs ×8.4
Simulation very_sparse_mask_cached[(0.5, 0.02, "self_dense_mask_2pct")] 435.5 µs 73.7 µs ×5.9
Simulation very_sparse_mask_uncached[(0.5, 0.005, "self_dense_mask_0p5pct")] 432.3 µs 59.5 µs ×7.3
Simulation very_sparse_mask_uncached[(0.5, 0.02, "self_dense_mask_2pct")] 449 µs 82.1 µs ×5.5
Simulation rank_indices[(0.05, 0.5, "self_sparse_rank_dense")] 120.1 µs 47.4 µs ×2.5
Simulation rank_indices[(0.5, 0.05, "self_dense_rank_sparse")] 462.6 µs 58.6 µs ×7.9
... ... ... ... ... ...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.


Comparing rk/intersect-by-rank (a8488fc) with develop (f307edc)

Open in CodSpeed