[SPARK-56802][SQL] Add bulk read+widen path for FLOAT to Double Parquet vector updater by LuciferYang · Pull Request #55816 · apache/spark

LuciferYang · 2026-05-12T05:54:40Z

What changes were proposed in this pull request?

Extend the bulk read+widen pattern introduced in SPARK-56791 to FloatToDoubleUpdater (parquet FLOAT read into Spark DoubleType).

A new readFloatsAsDoubles default method on VectorizedValuesReader does the per-row fallback. VectorizedPlainValuesReader overrides it to fetch source bytes once via getBuffer(total * 4) and run a tight in-method conversion loop. FloatToDoubleUpdater.readValues becomes a one-line delegation. The widen is Java's primitive float-to-double conversion: exact for every finite and infinite float; a NaN float widens to a double NaN (the JVM may canonicalize the payload).

Why are the changes needed?

FloatToDoubleUpdater.readValues allocates a fresh ByteBuffer slice inside getBuffer(4) for every element on the legacy path, and that allocation dominates the loop. Collapsing N allocations into one is the same win SPARK-56791 delivered for the INT32 -> Long sibling.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

(To be updated after the GHA benchmark and test runs complete.)

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

…et vector updater

…uet.ParquetVectorUpdaterBenchmark (JDK 17, Scala 2.13, split 1 of 1)

…uet.ParquetVectorUpdaterBenchmark (JDK 21, Scala 2.13, split 1 of 1)

…uet.ParquetVectorUpdaterBenchmark (JDK 25, Scala 2.13, split 1 of 1)

LuciferYang · 2026-05-12T18:13:24Z

-DowncastLongUpdater (INT64 -> Decimal(9,2))              2              2           0        455.0           2.2       0.4X
+IntegerToLongUpdater                                     1              1           0       1280.6           0.8       1.0X
+IntegerToDoubleUpdater                                   1              1           0       1537.9           0.7       1.2X
+FloatToDoubleUpdater                                     1              1           0       1418.8           0.7       1.1X


This is the embodiment of this optimization.

LuciferYang marked this pull request as draft May 12, 2026 05:55

[SPARK-56802][SQL] Add bulk read+widen path for FLOAT to Double Parqu…

051b94b

…et vector updater

LuciferYang force-pushed the SPARK-56802-float-to-double branch from 308150a to 051b94b Compare May 12, 2026 17:15

LuciferYang added 3 commits May 12, 2026 17:36

Benchmark results for org.apache.spark.sql.execution.datasources.parq…

fcf4b24

…uet.ParquetVectorUpdaterBenchmark (JDK 17, Scala 2.13, split 1 of 1)

Benchmark results for org.apache.spark.sql.execution.datasources.parq…

68fc576

…uet.ParquetVectorUpdaterBenchmark (JDK 21, Scala 2.13, split 1 of 1)

Benchmark results for org.apache.spark.sql.execution.datasources.parq…

7b05162

…uet.ParquetVectorUpdaterBenchmark (JDK 25, Scala 2.13, split 1 of 1)

LuciferYang commented May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56802][SQL] Add bulk read+widen path for FLOAT to Double Parquet vector updater#55816

[SPARK-56802][SQL] Add bulk read+widen path for FLOAT to Double Parquet vector updater#55816
LuciferYang wants to merge 4 commits into
apache:masterfrom
LuciferYang:SPARK-56802-float-to-double

LuciferYang commented May 12, 2026

Uh oh!

LuciferYang May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LuciferYang commented May 12, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

LuciferYang May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant