Skip to content

[Bug] Dynamic filter leads to wrong results in some aggregation queries with >1 partitions #20267

@bharath-techie

Description

@bharath-techie

Describe the bug

When enable_dynamic_filter_pushdown and enable_aggregate_dynamic_filter_pushdown are enabled (the defaults), queries on parquet files produce incorrect results with target_partitions > 1.

In this case The dynamic filter derived from MAX("ResolutionHeight") is incorrectly pushed down on ResolutionWidth instead of ResolutionHeight [ just from what i understand ], causing valid file ranges/row groups to be pruned.

To Reproduce

Setup

  1. Use the ClickBench hits partitioned dataset

  2. use datafusion-cli

  3. CREATE EXTERNAL TABLE hits
    STORED AS PARQUET
    LOCATION '/path/to/partitioned/hits/';

  4. Query with default settings -- returns wrong results

>  SELECT COUNT(*) as cnt, AVG("Age") as avg_age, SUM("IsRefresh") as refreshes, AVG("ResolutionWidth") as avg_width, MAX("ResolutionHeight") as max_height 
FROM hits 
WHERE "CounterID" < 1000;
+---------+-------------------+-----------+--------------------+------------+
| cnt     | avg_age           | refreshes | avg_width          | max_height |
+---------+-------------------+-----------+--------------------+------------+
| 1119068 | 26.87201403310612 | 258245    | 1539.4615429982807 | 9575       |
+---------+-------------------+-----------+--------------------+------------+
1 row(s) fetched. 
Elapsed 0.038 seconds.
  1. Set the target partition to 1 --- same query returns right results
SELECT COUNT(*) as cnt, AVG("Age") as avg_age, SUM("IsRefresh") as refreshes, AVG("ResolutionWidth") as avg_width, MAX("ResolutionHeight") as max_height 
FROM hits 
WHERE "CounterID" < 1000;
0 row(s) fetched. 
Elapsed 0.000 seconds.

0 row(s) fetched. 
Elapsed 0.000 seconds.

+---------+-------------------+-----------+--------------------+------------+
| cnt     | avg_age           | refreshes | avg_width          | max_height |
+---------+-------------------+-----------+--------------------+------------+
| 1591782 | 26.14465108915668 | 266931    | 1330.2180763446252 | 9575       |
+---------+-------------------+-----------+--------------------+------------+
1 row(s) fetched. 
Elapsed 0.038 seconds.
  1. Actual workaround with partitions

Disabling dynamic filter pushdown produces correct results even with parallel execution:

SET datafusion.optimizer.enable_dynamic_filter_pushdown = false;
SET datafusion.optimizer.enable_aggregate_dynamic_filter_pushdown = false;

Expected behavior

The query should return right result with dynamic filter pushdown enabled / default settings with any number of partitions.

Additional context

check files_ranges_pruned_statistics
Wrong results - explain analyze output with dynamic filters enabled :

|                   |           DataSourceExec: file_groups={10 groups: [[home/ec2-user/clickdata/partitioned/hits/hits_0.parquet:0..122446530, home/ec2-user/clickdata/partitioned/hits/hits_1.parquet:0..174965044, home/ec2-user/clickdata/partitioned/hits/hits_10.parquet:0..101513258, home/ec2-user/clickdata/partitioned/hits/hits_11.parquet:0..118419888, home/ec2-user/clickdata/partitioned/hits/hits_12.parquet:0..149514164, ...], [home/ec2-user/clickdata/partitioned/hits/hits_19.parquet:55500514..103692598, home/ec2-user/clickdata/partitioned/hits/hits_2.parquet:0..230595491, home/ec2-user/clickdata/partitioned/hits/hits_20.parquet:0..85766533, home/ec2-user/clickdata/partitioned/hits/hits_21.parquet:0..113455196, home/ec2-user/clickdata/partitioned/hits/hits_22.parquet:0..79775901, ...], [home/ec2-user/clickdata/partitioned/hits/hits_3.parquet:49140764..192507052, home/ec2-user/clickdata/partitioned/hits/hits_30.parquet:0..124187913, home/ec2-user/clickdata/partitioned/hits/hits_31.parquet:0..123065410, home/ec2-user/clickdata/partitioned/hits/hits_32.parquet:0..94506004, home/ec2-user/clickdata/partitioned/hits/hits_33.parquet:0..78243765, ...], [home/ec2-user/clickdata/partitioned/hits/hits_40.parquet:60145803..142508647, home/ec2-user/clickdata/partitioned/hits/hits_41.parquet:0..290614269, home/ec2-user/clickdata/partitioned/hits/hits_42.parquet:0..288524057, home/ec2-user/clickdata/partitioned/hits/hits_43.parquet:0..299692947, home/ec2-user/clickdata/partitioned/hits/hits_44.parquet:0..242404750, ...], [home/ec2-user/clickdata/partitioned/hits/hits_47.parquet:29698740..34336875, home/ec2-user/clickdata/partitioned/hits/hits_48.parquet:0..33680419, home/ec2-user/clickdata/partitioned/hits/hits_49.parquet:0..99795554, home/ec2-user/clickdata/partitioned/hits/hits_5.parquet:0..122286439, home/ec2-user/clickdata/partitioned/hits/hits_50.parquet:0..245339079, ...], ...]}, 

projection=[CounterID, IsRefresh, ResolutionWidth, ResolutionHeight, Age], file_type=parquet, 
predicate=CounterID@6 < 1000 AND DynamicFilter [ ResolutionWidth@20 > 9575 ], 
pruning_predicate=CounterID_null_count@1 != row_count@2 AND CounterID_min@0 < 1000 AND ResolutionWidth_null_count@4 != row_count@2 AND ResolutionWidth_max@3 > 9575, 


required_guarantees=[], metrics=[output_rows=1.40 M, elapsed_compute=10ns, output_bytes=16.0 MB, output_batches=171, files_ranges_pruned_statistics=109 total → 1 matched, 

row_groups_pruned_statistics=8 total → 6 matched, row_groups_pruned_bloom_filter=6 total → 6 matched, page_index_rows_pruned=0 total → 0 matched, batches_split=0, bytes_scanned=1.21 M, file_open_errors=0, file_scan_errors=0, num_predicate_creation_errors=0, predicate_cache_inner_records=0, predicate_cache_records=0, predicate_evaluation_errors=0, pushdown_rows_matched=0, pushdown_rows_pruned=0, bloom_filter_eval_time=59.03µs, metadata_load_time=926.80µs, page_index_eval_time=741ns, row_pushdown_eval_time=218ns, statistics_eval_time=46.58µs, time_elapsed_opening=2.49ms, time_elapsed_processing=49.95ms, time_elapsed_scanning_total=49.87ms, time_elapsed_scanning_until_data=10.93ms, scan_efficiency_ratio=1.2% (1.21 M/103.7 M)]

Correct results - explain analyze output - dynamic filters disabled :

|                   |           DataSourceExec: file_groups={10 groups: [[home/ec2-user/clickdata/partitioned/hits/hits_0.parquet:0..122446530, home/ec2-user/clickdata/partitioned/hits/hits_1.parquet:0..174965044, home/ec2-user/clickdata/partitioned/hits/hits_10.parquet:0..101513258, home/ec2-user/clickdata/partitioned/hits/hits_11.parquet:0..118419888, home/ec2-user/clickdata/partitioned/hits/hits_12.parquet:0..149514164, ...], [home/ec2-user/clickdata/partitioned/hits/hits_19.parquet:55500514..103692598, home/ec2-user/clickdata/partitioned/hits/hits_2.parquet:0..230595491, home/ec2-user/clickdata/partitioned/hits/hits_20.parquet:0..85766533, home/ec2-user/clickdata/partitioned/hits/hits_21.parquet:0..113455196, home/ec2-user/clickdata/partitioned/hits/hits_22.parquet:0..79775901, ...], [home/ec2-user/clickdata/partitioned/hits/hits_3.parquet:49140764..192507052, home/ec2-user/clickdata/partitioned/hits/hits_30.parquet:0..124187913, home/ec2-user/clickdata/partitioned/hits/hits_31.parquet:0..123065410, home/ec2-user/clickdata/partitioned/hits/hits_32.parquet:0..94506004, home/ec2-user/clickdata/partitioned/hits/hits_33.parquet:0..78243765, ...], [home/ec2-user/clickdata/partitioned/hits/hits_40.parquet:60145803..142508647, home/ec2-user/clickdata/partitioned/hits/hits_41.parquet:0..290614269, home/ec2-user/clickdata/partitioned/hits/hits_42.parquet:0..288524057, home/ec2-user/clickdata/partitioned/hits/hits_43.parquet:0..299692947, home/ec2-user/clickdata/partitioned/hits/hits_44.parquet:0..242404750, ...], [home/ec2-user/clickdata/partitioned/hits/hits_47.parquet:29698740..34336875, home/ec2-user/clickdata/partitioned/hits/hits_48.parquet:0..33680419, home/ec2-user/clickdata/partitioned/hits/hits_49.parquet:0..99795554, home/ec2-user/clickdata/partitioned/hits/hits_5.parquet:0..122286439, home/ec2-user/clickdata/partitioned/hits/hits_50.parquet:0..245339079, ...], ...]}, projection=[CounterID, IsRefresh, ResolutionWidth, ResolutionHeight, Age], file_type=parquet, 

predicate=CounterID@6 < 1000, pruning_predicate=CounterID_null_count@1 != row_count@2 AND CounterID_min@0 < 1000, required_guarantees=[],
 metrics=[output_rows=2.45 M, elapsed_compute=10ns, output_bytes=28.0 MB, output_batches=301, files_ranges_pruned_statistics=109 total → 20 matched, 

row_groups_pruned_statistics=51 total → 6 matched, row_groups_pruned_bloom_filter=6 total → 6 matched, page_index_rows_pruned=0 total → 0 matched, batches_split=0, bytes_scanned=1.97 M, file_open_errors=0, file_scan_errors=0, num_predicate_creation_errors=0, predicate_cache_inner_records=0, predicate_cache_records=0, predicate_evaluation_errors=0, pushdown_rows_matched=0, pushdown_rows_pruned=0, bloom_filter_eval_time=61.60µs, metadata_load_time=2.64ms, page_index_eval_time=2.00µs, row_pushdown_eval_time=218ns, statistics_eval_time=62.77µs, time_elapsed_opening=2.08ms, time_elapsed_processing=41.38ms, time_elapsed_scanning_total=44.99ms, time_elapsed_scanning_until_data=10.36ms, scan_efficiency_ratio=1.9% (1.97 M/103.7 M)] |

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions