Outline nulls from ScalarValue and fix de/serialization with Proto#6309
Outline nulls from ScalarValue and fix de/serialization with Proto#6309connortsui20 wants to merge 22 commits intodevelopfrom
ScalarValue and fix de/serialization with Proto#6309Conversation
04d214e to
a321ca9
Compare
ScalarValue and cast on deserialize
ScalarValue and cast on deserializeScalarValue and fix de/serialization with Proto
f352246 to
dded7b4
Compare
gatesn
left a comment
There was a problem hiding this comment.
A couple of comments on the main Scalar API.
0ee15dc to
cd8ea9d
Compare
There was a problem hiding this comment.
I believe some changes are needed here
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
ScalarValue and fix de/serialization with ProtoScalarValue and fix de/serialization with Proto
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
cd8ea9d to
323dfb5
Compare
Merging this PR will degrade performance by 98.62%
Performance Changes
Comparing Footnotes
|
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
🚨🚨🚨❌❌❌ SQL BENCHMARK FAILED ❌❌❌🚨🚨🚨Benchmark |
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-DS SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: PolarSignals ProfilingSummary
Detailed Results Table
|
Benchmarks: FineWeb NVMeSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb S3Summary
Detailed Results Table
|
Benchmarks: Statistical and Population GeneticsSummary
Detailed Results Table
|
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
|
Any idea why this makes FixedSizedList take so much better? From what I see is we managed to inline more aggressively but not sure if that's the reason |
|
@robert3005 the perf swing is from #6310, not this one. Afaict this change doesn't affect performance in any meaningful manner |
Currently, we store an opaque (but not really apparently?)
InnerScalarValueinside aScalarValue, and theInnerScalarValueis allowed to beNull.This PR both removes outlines the
InnerScalarValue::Nullvariant, where a nullable scalar is now anOption<ScalarValue>(instead of just aScalarValue. This also completely removesInnerScalarValuein favor of a publicScalarValueenum:Additionally, all
Scalars are verified to be sound on construction by checking that theDTypeof theScalaris_compatiblewith theOption<&ScalarValue>that is passed.The stricter construction changes then mean that we need to change how deserialization of scalars work. The protobuf format is not exactly 1-1 with our
Scalars (notably, it only supports 64-bit integers). This means that we might write 8-bit integers and the round trip returns a 64-bit integer. So this PR also changes some APIs to allow us to pass aDTypeto the statistics deserializers. TBD on if this needs to happen in more places (FileStatistics?).For reviewers: try to look over all of the diffs since a large majority is not just renaming variables, they are semantic changes that I am not super confidant is right.
Breaks
Breaks the old
file_statsmethod onVortexFileto returnFileStatisticsinstead of the array ofStatsSet. We needed to do this in order to correctly read statistics from DataFusion, specifically in the case of schema evolution.TODO
There's still a bunch of TODOs remaining (and one test that has been ignored for now that need to be unignored), but it would probably be good to review some things first.