Is your feature request related to a problem or challenge?
DataFusion supports ParquetAccessPlan as a PartitionedFile extension, but that API is row-group based. Callers must know the row group layout and split any file-level row selection themselves.
DataFusion already loads the row group metadata when opening the file, so it can split that selection internally.
Describe the solution you'd like
Add a ParquetRowSelection extension that wraps a file-level RowSelection.
When opening the file, DataFusion should convert it to the existing ParquetAccessPlan form:
- all rows skipped:
RowGroupAccess::Skip
- all rows selected:
RowGroupAccess::Scan
- mixed selected/skipped rows:
RowGroupAccess::Selection
The selection should be rejected if its total row count does not match the Parquet metadata.
Describe alternatives you've considered
No response
Additional context
No response
Is your feature request related to a problem or challenge?
DataFusion supports
ParquetAccessPlanas aPartitionedFileextension, but that API is row-group based. Callers must know the row group layout and split any file-level row selection themselves.DataFusion already loads the row group metadata when opening the file, so it can split that selection internally.
Describe the solution you'd like
Add a
ParquetRowSelectionextension that wraps a file-levelRowSelection.When opening the file, DataFusion should convert it to the existing
ParquetAccessPlanform:RowGroupAccess::SkipRowGroupAccess::ScanRowGroupAccess::SelectionThe selection should be rejected if its total row count does not match the Parquet metadata.
Describe alternatives you've considered
No response
Additional context
No response