Skip to content

Native scans cannot propagate JVM-side Spark accumulators #3867

@andygrove

Description

@andygrove

Description

When using native Parquet scans (COMET_PARQUET_SCAN_IMPL=native_datafusion), Spark accumulators registered on the JVM side are not incremented because the native reader operates entirely in Rust via DataFusion and does not call back into JVM accumulator APIs.

This does not affect query correctness or user-visible behavior. Filter pushdown works correctly in native scans. The issue is that several Spark SQL tests use JVM-side accumulators (e.g. NumRowGroupsAcc) as a test mechanism to verify filter pushdown, and that mechanism does not work across the JNI boundary.

Affected Tests

The following Spark SQL tests are currently ignored with IgnoreCometNativeScan due to this limitation:

  • filter pushdown - StringPredicate (ParquetFilterSuite) - uses NumRowGroupsAcc to verify string predicate pushdown
  • Filters should be pushed down for vectorized Parquet reader at row group level (ParquetFilterSuite) - uses accumulator to verify row group level filtering
  • SPARK-34562: Bloom filter push down (ParquetFilterSuite) - uses accumulator to verify bloom filter pushdown

These tests are skipped across all Spark versions (3.4, 3.5, 4.0).

Possible Solutions

  1. Implement accumulator propagation from native scans back to JVM
  2. Add equivalent native-side metrics that can be queried from JVM after execution
  3. Write alternative Comet-specific tests that verify the same pushdown behavior without relying on accumulators

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions