Skip to content

[SPARK-56802][SQL] Add bulk read+widen path for FLOAT to Double Parquet vector updater#55816

Draft
LuciferYang wants to merge 4 commits into
apache:masterfrom
LuciferYang:SPARK-56802-float-to-double
Draft

[SPARK-56802][SQL] Add bulk read+widen path for FLOAT to Double Parquet vector updater#55816
LuciferYang wants to merge 4 commits into
apache:masterfrom
LuciferYang:SPARK-56802-float-to-double

Conversation

@LuciferYang
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Extend the bulk read+widen pattern introduced in SPARK-56791 to FloatToDoubleUpdater (parquet FLOAT read into Spark DoubleType).

A new readFloatsAsDoubles default method on VectorizedValuesReader does the per-row fallback. VectorizedPlainValuesReader overrides it to fetch source bytes once via getBuffer(total * 4) and run a tight in-method conversion loop. FloatToDoubleUpdater.readValues becomes a one-line delegation. The widen is Java's primitive float-to-double conversion: exact for every finite and infinite float; a NaN float widens to a double NaN (the JVM may canonicalize the payload).

Why are the changes needed?

FloatToDoubleUpdater.readValues allocates a fresh ByteBuffer slice inside getBuffer(4) for every element on the legacy path, and that allocation dominates the loop. Collapsing N allocations into one is the same win SPARK-56791 delivered for the INT32 -> Long sibling.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

(To be updated after the GHA benchmark and test runs complete.)

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

@LuciferYang LuciferYang marked this pull request as draft May 12, 2026 05:55
@LuciferYang LuciferYang force-pushed the SPARK-56802-float-to-double branch from 308150a to 051b94b Compare May 12, 2026 17:15
…uet.ParquetVectorUpdaterBenchmark (JDK 17, Scala 2.13, split 1 of 1)
…uet.ParquetVectorUpdaterBenchmark (JDK 21, Scala 2.13, split 1 of 1)
…uet.ParquetVectorUpdaterBenchmark (JDK 25, Scala 2.13, split 1 of 1)
DowncastLongUpdater (INT64 -> Decimal(9,2)) 2 2 0 455.0 2.2 0.4X
IntegerToLongUpdater 1 1 0 1280.6 0.8 1.0X
IntegerToDoubleUpdater 1 1 0 1537.9 0.7 1.2X
FloatToDoubleUpdater 1 1 0 1418.8 0.7 1.1X
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the embodiment of this optimization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant