Better sort pushdown for DF#8557
Conversation
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | slice_empty_vortex |
310 ns | 368.3 ns | -15.84% |
| ❌ | Simulation | encode_varbin[(1000, 8)] |
141.1 µs | 156.8 µs | -10.04% |
| ⚡ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
27 µs | 16.2 µs | +66.33% |
| ⚡ | Simulation | eq_i64_constant |
319 µs | 289.2 µs | +10.31% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing adamg/better-sort-pushdown (1067d6d) with develop (1118a20)
Footnotes
-
4 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
88822d3 to
4fc9dc1
Compare
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals Profiling (base)Vortex (geomean): 1.030x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (1.030x ➖, 1↑ 1↓)
No file size changes detected. |
Benchmarks: FineWeb NVMe (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.146x ❌, 0↑ 8↓)
datafusion / parquet (1.162x ❌, 0↑ 9↓)
duckdb / vortex-file-compressed (1.141x ❌, 0↑ 7↓)
duckdb / parquet (1.136x ❌, 0↑ 9↓)
File Size Changes (3 files changed, -46.2% overall, 1↑ 2↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVME (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.979x ➖, 0↑ 0↓)
datafusion / parquet (0.963x ➖, 3↑ 0↓)
datafusion / arrow (0.947x ➖, 3↑ 0↓)
duckdb / vortex-file-compressed (0.976x ➖, 0↑ 0↓)
duckdb / parquet (0.974x ➖, 1↑ 0↓)
File Size Changes (17 files changed, -44.5% overall, 3↑ 14↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVME (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.047x ➖, 0↑ 7↓)
datafusion / parquet (1.052x ➖, 0↑ 11↓)
duckdb / vortex-file-compressed (1.042x ➖, 1↑ 7↓)
duckdb / parquet (1.016x ➖, 0↑ 3↓)
File Size Changes (30 files changed, -43.5% overall, 3↑ 27↓)
Totals:
|
Benchmarks: FineWeb S3 (base)Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.882x ➖, 2↑ 1↓)
datafusion / parquet (0.765x ➖, 2↑ 0↓)
duckdb / vortex-file-compressed (0.922x ➖, 1↑ 0↓)
duckdb / parquet (0.961x ➖, 0↑ 0↓)
|
Benchmarks: Statistical and Population Genetics (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (1.034x ➖, 0↑ 0↓)
duckdb / parquet (1.026x ➖, 0↑ 0↓)
File Size Changes (3 files changed, -32.3% overall, 1↑ 2↓)
Totals:
|
Benchmarks: Clickbench Sorted on NVME (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.053x ➖, 1↑ 3↓)
datafusion / parquet (1.014x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.928x ➖, 3↑ 0↓)
duckdb / parquet (0.994x ➖, 0↑ 0↓)
File Size Changes (201 files changed, -42.6% overall, 55↑ 146↓)
Totals:
|
Benchmarks: TPC-H SF=10 on NVME (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.984x ➖, 0↑ 0↓)
datafusion / parquet (0.985x ➖, 0↑ 0↓)
datafusion / arrow (0.960x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.989x ➖, 0↑ 0↓)
duckdb / parquet (0.992x ➖, 0↑ 0↓)
File Size Changes (47 files changed, -44.4% overall, 13↑ 34↓)
Totals:
|
Benchmarks: Clickbench on NVME (base)Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.918x ➖, 13↑ 0↓)
datafusion / parquet (0.967x ➖, 4↑ 0↓)
duckdb / vortex-file-compressed (0.974x ➖, 7↑ 3↓)
duckdb / parquet (0.990x ➖, 0↑ 1↓)
File Size Changes (201 files changed, -39.1% overall, 47↑ 154↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3 (base)Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.998x ➖, 2↑ 3↓)
datafusion / parquet (0.893x ➖, 5↑ 3↓)
duckdb / vortex-file-compressed (0.930x ➖, 0↑ 0↓)
duckdb / parquet (0.994x ➖, 0↑ 0↓)
|
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Summary
This PR improves sort pushdown into DF, by storing the source order in the source, and re-ordering files according to it (when applicable).
Files are sorted by the min stat for the first column references in the sort expression.