-
Notifications
You must be signed in to change notification settings - Fork 298
Native scans cannot propagate JVM-side Spark accumulators #3867
Copy link
Copy link
Open
Labels
Description
Description
When using native Parquet scans (COMET_PARQUET_SCAN_IMPL=native_datafusion), Spark accumulators registered on the JVM side are not incremented because the native reader operates entirely in Rust via DataFusion and does not call back into JVM accumulator APIs.
This does not affect query correctness or user-visible behavior. Filter pushdown works correctly in native scans. The issue is that several Spark SQL tests use JVM-side accumulators (e.g. NumRowGroupsAcc) as a test mechanism to verify filter pushdown, and that mechanism does not work across the JNI boundary.
Affected Tests
The following Spark SQL tests are currently ignored with IgnoreCometNativeScan due to this limitation:
filter pushdown - StringPredicate(ParquetFilterSuite) - usesNumRowGroupsAccto verify string predicate pushdownFilters should be pushed down for vectorized Parquet reader at row group level(ParquetFilterSuite) - uses accumulator to verify row group level filteringSPARK-34562: Bloom filter push down(ParquetFilterSuite) - uses accumulator to verify bloom filter pushdown
These tests are skipped across all Spark versions (3.4, 3.5, 4.0).
Possible Solutions
- Implement accumulator propagation from native scans back to JVM
- Add equivalent native-side metrics that can be queried from JVM after execution
- Write alternative Comet-specific tests that verify the same pushdown behavior without relying on accumulators
Reactions are currently unavailable