CodSpeed HQ / CodSpeed Performance Analysis
succeeded
May 6, 2026 in 0s
Performance Gate Passed
⚠️ Unknown Walltime execution environment detected
Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.
For the most accurate results, we recommend using CodSpeed Macro Runners : bare-metal machines fine-tuned for performance measurement consistency.
⚡ 25 improved benchmarks
✅ 1181 untouched benchmarks
Performance Changes
Mode
Benchmark
BASE
HEAD
Efficiency
⚡
WallTime
cuda/bitpacked_u8/unpack/3bw[100M]
352.3 µs
300.4 µs
+17.24%
⚡
Simulation
density_matrix[(0.05, 0.05, "self_sparse_mask_sparse")]
83 µs
47.6 µs
+74.43%
⚡
Simulation
density_matrix[(0.5, 0.05, "self_dense_mask_sparse")]
482.5 µs
53.2 µs
×9.1
⚡
Simulation
intersect_by_rank[(10000, "random")]
103.6 µs
10.3 µs
×10
⚡
Simulation
intersect_by_rank[(100000, "random")]
979.4 µs
53 µs
×18
⚡
Simulation
density_matrix[(0.05, 0.5, "self_sparse_mask_dense")]
131.8 µs
47.6 µs
×2.8
⚡
Simulation
density_matrix[(0.5, 0.5, "self_dense_mask_dense")]
979 µs
52.8 µs
×19
⚡
Simulation
intersect_by_rank[(10000, "runs")]
103.6 µs
10.1 µs
×10
⚡
Simulation
intersect_by_rank[(100000, "runs")]
976.8 µs
53 µs
×18
⚡
Simulation
rank_indices[(0.05, 0.05, "self_sparse_rank_sparse")]
80.9 µs
43 µs
+87.87%
⚡
Simulation
rank_indices[(0.5, 0.01, "self_dense_rank_very_sparse")]
427.9 µs
58.8 µs
×7.3
⚡
Simulation
rank_indices[(0.5, 0.5, "self_dense_rank_dense")]
867.5 µs
53.4 µs
×16
⚡
Simulation
sparse[(100000, 0.05, "sparse_5pct")]
132.1 µs
47.8 µs
×2.8
⚡
Simulation
sparse[(100000, 0.5, "dense_50pct")]
979.7 µs
53.2 µs
×18
⚡
Simulation
very_sparse_mask_cached[(0.5, 0.005, "self_dense_mask_0p5pct")]
422.3 µs
50.6 µs
×8.4
⚡
Simulation
very_sparse_mask_cached[(0.5, 0.02, "self_dense_mask_2pct")]
435.5 µs
73.7 µs
×5.9
⚡
Simulation
very_sparse_mask_uncached[(0.5, 0.005, "self_dense_mask_0p5pct")]
432.3 µs
59.5 µs
×7.3
⚡
Simulation
very_sparse_mask_uncached[(0.5, 0.02, "self_dense_mask_2pct")]
449 µs
82.1 µs
×5.5
⚡
Simulation
rank_indices[(0.05, 0.5, "self_sparse_rank_dense")]
120.1 µs
47.4 µs
×2.5
⚡
Simulation
rank_indices[(0.5, 0.05, "self_dense_rank_sparse")]
462.6 µs
58.6 µs
×7.9
...
...
...
...
...
...
ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks .
Comparing rk/intersect-by-rank (a8488fc ) with develop (f307edc )