Skip to content

Conversation

@KhaledR57
Copy link
Contributor

  • The Jira issue number for this PR is: MDEV-24943

Description

Aggregates lacked the SQL-standard FILTER clause, forcing CASE-based workarounds that reduced readability.

This update introduces the ability to specify a FILTER clause for aggregate functions, allowing for more granular control over which rows are included in the aggregation. Also, improves standards compliance and makes queries clearer and more readable.

<aggregate_function> ( <expression> ) [ FILTER ( WHERE <condition> ) ] [ OVER ( <window_spec> ) ]

The <condition> may contain any expression allowed in regular WHERE clauses, except subqueries, window functions, and outer references.

Example:

-- Using FILTER clause
SELECT AVG(value) FILTER (WHERE status = 'active') FROM sales;

-- Equivalent using CASE
SELECT AVG(CASE WHEN status = 'active' THEN value END) FROM sales;

How can this PR be tested?

Running the Test Suite

./mtr aggregates-filter

Validating CASE vs FILTER Equivalence

Use the compare.py script to verify CASE and FILTER produce identical results:

python3 compare.py aggregates-filter-case.result

Expected Result File Format:

The script expects result files with this structure:

#
# Test N: Description
#
# 1. CASE version
column1	column2	column3
value1	value2	value3
# 2. FILTER version
column1	column2	column3
value1	value2	value3

The script identifies test headers (# Test N: Description), CASE/FILTER markers (# 1. CASE version / # 2. FILTER version), and compares the data rows that follow each marker.

Basing the PR against the correct MariaDB version

  • This is a new feature or a refactoring, and the PR is based against the main branch.
  • This is a bug fix, and the PR is based against the earliest maintained branch in which the bug can be reproduced.

PR quality check

  • I checked the CODING_STANDARDS.md file and my PR conforms to this where appropriate.
  • For any trivial modifications to the PR, I am ok with the reviewer making the changes themselves.

@KhaledR57 KhaledR57 force-pushed the MDEV-24943-add-filter-clause branch from 9f3c0ff to cb7d38d Compare November 14, 2025 07:38
@svoj svoj added the External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements. label Nov 15, 2025
@gkodinov
Copy link
Contributor

Thank for taking the effort to work on this. Indeed, the LIMIT clause is a part of (at least) SQL 2011 and, as such, is a very good feature to have.

However, the diff seems very incomplete. Can you please make sure that you:

  1. Have a working diff.
  2. Have a good functional description of the change (hopefully in a jira)

Please re-submit when you have the above.
This is not a trivial change. I would suggest reaching out to the mariaDB developers at zulip if you need to talk about it with somebody.

@gkodinov gkodinov closed this Dec 15, 2025
@vuvova
Copy link
Member

vuvova commented Dec 15, 2025

I don't see how the diff is incomplete. It applies fine. It compiles fine. It passes tests on buildbot, not every builder, but it passes on many builders, this doesn't look like a non-working diff.

@DaveGosselin-MariaDB
Copy link
Member

DaveGosselin-MariaDB commented Dec 17, 2025

Hi @KhaledR57 , I'm experimenting with the changes and I see a difference between what should otherwise be equivalent statements. What was once written using CASE as a workaround should now work with FILTER as in the following examples below. First, on Postgres 18.1, we see that these queries are equivalent:

postgres=# SELECT COUNT(*) AS unfiltered, SUM( CASE WHEN generate_series < 5 THEN 1 ELSE 0 END ) AS filtered FROM generate_series(1,10);
 unfiltered | filtered
------------+----------
         10 |        4
(1 row)

postgres=# SELECT COUNT(*) AS unfiltered, COUNT(*) FILTER (WHERE generate_series < 5) AS filtered FROM generate_series(1,10);
 unfiltered | filtered
------------+----------
         10 |        4
(1 row)

However, on MariaDB, the FILTER analogue produces different results with your patch:

MariaDB [test]> SELECT COUNT(*) AS unfiltered, SUM( CASE WHEN seq < 5 THEN 1 ELSE 0 END ) AS filtered FROM seq_1_to_10;
+------------+----------+
| unfiltered | filtered |
+------------+----------+
|         10 |        4 |
+------------+----------+
1 row in set (0.005 sec)

MariaDB [test]> SELECT COUNT(*) AS unfiltered, COUNT(*) FILTER (WHERE seq < 5) AS filtered FROM seq_1_to_10;
+------------+----------+
| unfiltered | filtered |
+------------+----------+
|         10 |       10 |
+------------+----------+
1 row in set (0.001 sec)

Why is this the case?

@DaveGosselin-MariaDB
Copy link
Member

If we store the sequence engine result to an InnoDB-backed table and run the FILTER query against that instead we get the correct result:

create table t1 (seq int);
insert into t1 select seq from seq_1_to_10;
SELECT COUNT(*) AS unfiltered, COUNT(*) FILTER (WHERE seq < 5) AS filtered FROM t1;
+------------+----------+
| unfiltered | filtered |
+------------+----------+
|         10 |        4 |
+------------+----------+
1 row in set (0.001 sec)

Aggregates lacked the SQL-standard FILTER clause, forcing CASE-based workarounds that reduced readability across (sum, avg, count, …).

This update introduces the ability to specify a FILTER clause for aggregate functions,
allowing for more granular control over which rows are included in the aggregation.
Also, improves standards compliance and makes queries clearer and more readable.

The FILTER(WHERE ...) condition may contain any expression allowed in regular WHERE clauses,
except subqueries, window functions, and outer references.
@KhaledR57 KhaledR57 force-pushed the MDEV-24943-add-filter-clause branch from cb7d38d to aaea60a Compare December 19, 2025 04:13
@KhaledR57
Copy link
Contributor Author

Hi @DaveGosselin-MariaDB , Sorry for the delayed response!

The issue was with the sequence storage engine's group_by_handler optimization in storage/sequence/sequence.cc. It computes SUM() and COUNT() directly using formulas without scanning rows, but it wasn't checking for the FILTER clause, so the filter was completely bypassed.

I've added a has_filter() check so it falls back to normal aggregation when FILTER is present. Fixed now!

if (item->type() != Item::SUM_FUNC_ITEM ||
    (((Item_sum*) item)->sum_func() != Item_sum::SUM_FUNC &&
     ((Item_sum*) item)->sum_func() != Item_sum::COUNT_FUNC) ||
    ((Item_sum*) item)->has_filter())  // NEW CHECK
    return 0;  // Fall back to normal aggregation

Storing the sequence data to an InnoDB table gave the correct result because InnoDB goes through the standard opt_sum.cc optimization path, which I had already updated here to check has_filter() and skip the optimization when FILTER is present.

I'll add sequence-specific tests in the next commit after I finish the stored aggregates (almost done with those).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

External Contribution All PRs from entities outside of MariaDB Foundation, Corporation, Codership agreements.

Development

Successfully merging this pull request may close these issues.

5 participants