Add NormalizedVector extension type#7709
Conversation
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Merging this PR will degrade performance by 34.57%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | dict[10M_u64_values_u32_codes] |
279 µs | 314 µs | -11.15% |
| ❌ | WallTime | dict[10M_u64_values_u8_codes] |
215.8 µs | 253.6 µs | -14.9% |
| ⚡ | WallTime | mix[50%_in/50%_out] |
397.3 µs | 338.9 µs | +17.22% |
| ❌ | WallTime | dynamic_dispatch_f32[10M] |
115.7 µs | 171 µs | -32.34% |
| ❌ | WallTime | dynamic_dispatch_u32[10M] |
108.5 µs | 165.9 µs | -34.57% |
| ⚡ | WallTime | dynamic_dispatch_u32[10M] |
162.8 µs | 107.5 µs | +51.43% |
| ❌ | WallTime | dynamic_dispatch_u32[10M] |
108.5 µs | 164.9 µs | -34.16% |
| ⚡ | WallTime | 10M_90%[10000000] |
226.2 µs | 193.6 µs | +16.84% |
| ⚡ | Simulation | turboquant_decompress_dim1024_4bit |
39,232,213.1 ns | 579.4 ns | ×68,000 |
| ⚡ | Simulation | turboquant_decompress_dim1024_2bit |
39,239,905.6 ns | 579.4 ns | ×68,000 |
| ⚡ | Simulation | turboquant_decompress_dim1024_8bit |
39,258,756.7 ns | 579.4 ns | ×68,000 |
| ⚡ | Simulation | turboquant_decompress_dim768_4bit |
37,461,329.7 ns | 579.4 ns | ×65,000 |
| ⚡ | Simulation | turboquant_decompress_dim128_4bit |
4,877,985.6 ns | 579.4 ns | ×8,400 |
Comparing ct/norm-vector (6e0e6aa) with develop (0bb712b)
1f5d4a2 to
766a8a0
Compare
30fd24f to
0d3cb51
Compare
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
|
Polar Signals Profiling ResultsLatest Run
Previous Runs (1)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 0.951x ➖ datafusion / vortex-file-compressed (0.951x ➖, 1↑ 0↓)
|
File Sizes: PolarSignals ProfilingNo file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.991x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.992x ➖, 0↑ 0↓)
datafusion / parquet (1.004x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.967x ➖, 2↑ 0↓)
duckdb / vortex-compact (0.992x ➖, 0↑ 0↓)
duckdb / parquet (0.983x ➖, 1↑ 0↓)
Full attributed analysis
|
File Sizes: FineWeb NVMeNo file size changes detected. |
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (0.980x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.990x ➖, 0↑ 0↓)
datafusion / parquet (1.001x ➖, 0↑ 2↓)
datafusion / arrow (0.976x ➖, 2↑ 0↓)
duckdb / vortex-file-compressed (0.994x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.999x ➖, 0↑ 0↓)
duckdb / parquet (0.987x ➖, 1↑ 1↓)
duckdb / duckdb (0.995x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=1 on NVMEFile Size Changes (195 files changed, -98.4% overall, 0↑ 195↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.141x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.957x ➖, 0↑ 0↓)
datafusion / parquet (0.935x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.017x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.015x ➖, 0↑ 0↓)
duckdb / parquet (0.949x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) duckdb / vortex-file-compressed (0.954x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.989x ➖, 0↑ 0↓)
duckdb / parquet (0.980x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: Statistical and Population GeneticsNo file size changes detected. |
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.034x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.027x ➖, 0↑ 0↓)
datafusion / parquet (1.035x ➖, 0↑ 0↓)
datafusion / arrow (1.047x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.023x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.022x ➖, 0↑ 0↓)
duckdb / parquet (1.013x ➖, 0↑ 0↓)
duckdb / duckdb (1.009x ➖, 0↑ 0↓)
Full attributed analysis
|
File Sizes: TPC-H SF=10 on NVMENo file size changes detected. |
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.016x ➖, 0↑ 0↓)
datafusion / parquet (1.024x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.998x ➖, 7↑ 3↓)
duckdb / parquet (1.010x ➖, 0↑ 1↓)
duckdb / duckdb (1.011x ➖, 1↑ 2↓)
Full attributed analysis
|
File Sizes: Clickbench on NVMEFile Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.932x ➖, 1↑ 0↓)
datafusion / vortex-compact (0.980x ➖, 1↑ 3↓)
datafusion / parquet (0.959x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (0.995x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.989x ➖, 0↑ 0↓)
duckdb / parquet (1.018x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Random AccessVortex (geomean): 1.238x ❌ unknown / unknown (1.204x ❌, 1↑ 50↓)
|
Benchmarks: CompressionVortex (geomean): 1.013x ➖ unknown / unknown (1.007x ➖, 2↑ 8↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) datafusion / vortex-file-compressed (0.980x ➖, 0↑ 0↓)
datafusion / vortex-compact (0.974x ➖, 1↑ 0↓)
datafusion / parquet (0.939x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.032x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.968x ➖, 0↑ 0↓)
duckdb / parquet (1.008x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) datafusion / vortex-file-compressed (1.066x ➖, 1↑ 22↓)
datafusion / vortex-compact (1.062x ➖, 0↑ 13↓)
datafusion / parquet (1.055x ➖, 0↑ 13↓)
duckdb / vortex-file-compressed (1.053x ➖, 0↑ 13↓)
duckdb / vortex-compact (1.045x ➖, 0↑ 15↓)
duckdb / parquet (1.035x ➖, 0↑ 6↓)
duckdb / duckdb (1.062x ➖, 0↑ 23↓)
Full attributed analysis
|
File Sizes: TPC-DS SF=1 on NVMENo file size changes detected. |
|
@joseph-isaacs this branch doesn't affect any of our current benchmarks (it doesn't go through this codepath) |
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
|
This is getting quite complicated, but it is unclear if this is inherent complexity or because the design decisions I chose were bad. I'm going to try a pull out some changes out of this, but it is hard to say if this is worth it. |
Summary
Tracking issue: #7297
Adds a
NormalizedVectorextension type tovortex-tensor, which wraps aVectorextension array. Since we may want to add refinement types in the future, we chose to wrap aVectorstorage array instead of aFixedSizeListstorage array.Additionally does a bunch of general cleanup that was obviously bad when I was actually trying to add this (and I wasn't really able to pull it out into a separate PR because it is only obvious when working with this new type).
There are still a bunch of TODOs that I'd like to fix but I'm probably going to do those all at once in a later PR.
Testing
Adds basic tests for the new
NormalizedVectortype and also adds some more complex tests for logic that was missing before for the older code.