Skip to content

[feat](aggregate) support FIRST aggregate type#59017

Closed
AntiTopQuark wants to merge 1 commit intoapache:masterfrom
AntiTopQuark:support_first
Closed

[feat](aggregate) support FIRST aggregate type#59017
AntiTopQuark wants to merge 1 commit intoapache:masterfrom
AntiTopQuark:support_first

Conversation

@AntiTopQuark
Copy link

What problem does this PR solve?

Issue Number: close #58204

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@AntiTopQuark
Copy link
Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.83% (1730/2167)
Line Coverage 65.86% (30610/46476)
Region Coverage 66.59% (15271/22934)
Branch Coverage 56.93% (8121/14266)

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 55.56% (5/9) 🎉
Increment coverage report
Complete coverage report

@doris-robot
Copy link

TPC-H: Total hot run time: 35104 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f27976fbf28bad8eb4a40ba7c6b511ae119c4f85, data reload: false

------ Round 1 ----------------------------------
q1	17673	4217	4044	4044
q2	2021	366	243	243
q3	10484	1334	726	726
q4	10373	928	316	316
q5	9767	2195	1929	1929
q6	218	166	135	135
q7	997	862	712	712
q8	9379	1522	1137	1137
q9	7282	5410	5305	5305
q10	6898	2396	1960	1960
q11	534	320	305	305
q12	704	737	615	615
q13	17781	3705	3021	3021
q14	296	311	292	292
q15	610	529	508	508
q16	698	674	635	635
q17	697	801	556	556
q18	8125	7120	6965	6965
q19	1099	978	617	617
q20	398	362	244	244
q21	4344	3994	3879	3879
q22	1058	1020	960	960
Total cold run time: 111436 ms
Total hot run time: 35104 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4181	4383	4135	4135
q2	362	415	335	335
q3	2242	2715	2324	2324
q4	1401	1820	1355	1355
q5	4864	4820	4635	4635
q6	247	177	134	134
q7	2050	1963	1810	1810
q8	2785	2617	2638	2617
q9	7654	7598	7555	7555
q10	3079	3279	3054	3054
q11	621	510	495	495
q12	682	746	608	608
q13	3584	4164	3494	3494
q14	308	301	284	284
q15	566	515	520	515
q16	661	699	642	642
q17	1356	1546	1392	1392
q18	7857	7693	7596	7596
q19	901	867	870	867
q20	1920	1959	1813	1813
q21	4749	4311	4202	4202
q22	1058	1039	982	982
Total cold run time: 53128 ms
Total hot run time: 50844 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 179139 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f27976fbf28bad8eb4a40ba7c6b511ae119c4f85, data reload: false

query5	4965	630	502	502
query6	331	234	212	212
query7	4203	464	275	275
query8	303	260	245	245
query9	8765	2543	2587	2543
query10	556	383	338	338
query11	15497	14742	14599	14599
query12	171	118	110	110
query13	1266	507	380	380
query14	5885	3291	3041	3041
query14_1	2918	2841	2880	2841
query15	211	196	187	187
query16	780	463	488	463
query17	1131	715	580	580
query18	2429	436	341	341
query19	225	225	201	201
query20	120	115	113	113
query21	216	137	117	117
query22	3897	4093	4073	4073
query23	16545	16312	16039	16039
query23_1	16006	16046	16177	16046
query24	7317	1669	1271	1271
query24_1	1263	1237	1233	1233
query25	567	484	439	439
query26	1249	264	158	158
query27	2780	482	308	308
query28	4485	2127	2118	2118
query29	802	543	437	437
query30	313	249	218	218
query31	818	730	656	656
query32	82	74	71	71
query33	541	353	289	289
query34	907	918	542	542
query35	789	818	746	746
query36	874	905	814	814
query37	131	96	75	75
query38	2864	2949	2850	2850
query39	789	760	739	739
query39_1	719	730	717	717
query40	230	137	119	119
query41	75	73	66	66
query42	111	113	111	111
query43	435	444	396	396
query44	1365	767	755	755
query45	195	196	189	189
query46	893	983	609	609
query47	1689	1723	1620	1620
query48	329	336	257	257
query49	648	463	368	368
query50	699	328	228	228
query51	3931	3834	3905	3834
query52	108	116	107	107
query53	336	354	301	301
query54	311	277	286	277
query55	81	79	74	74
query56	306	321	326	321
query57	1164	1131	1097	1097
query58	286	262	260	260
query59	2365	2530	2350	2350
query60	352	327	314	314
query61	197	186	190	186
query62	713	688	639	639
query63	334	303	303	303
query64	5231	1416	1131	1131
query65	4096	3979	3975	3975
query66	1486	475	349	349
query67	15392	15125	14844	14844
query68	8070	1029	732	732
query69	517	353	325	325
query70	1077	975	1009	975
query71	415	317	283	283
query72	6053	5013	4981	4981
query73	682	611	312	312
query74	8954	8745	8615	8615
query75	3209	3147	2774	2774
query76	3831	1139	748	748
query77	530	400	297	297
query78	9394	9673	8877	8877
query79	1647	885	628	628
query80	726	655	553	553
query81	527	271	234	234
query82	202	135	106	106
query83	270	259	238	238
query84	268	126	109	109
query85	908	521	463	463
query86	393	298	287	287
query87	3092	3067	2983	2983
query88	3776	2277	2263	2263
query89	471	434	402	402
query90	2203	164	162	162
query91	173	171	144	144
query92	87	69	78	69
query93	1851	926	559	559
query94	483	296	283	283
query95	573	385	317	317
query96	590	491	212	212
query97	2272	2280	2219	2219
query98	220	191	197	191
query99	1318	1315	1205	1205
Total cold run time: 261601 ms
Total hot run time: 179139 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.45 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f27976fbf28bad8eb4a40ba7c6b511ae119c4f85, data reload: false

query1	0.05	0.05	0.04
query2	0.15	0.07	0.07
query3	0.33	0.09	0.08
query4	1.60	0.10	0.10
query5	0.26	0.25	0.24
query6	1.19	0.66	0.64
query7	0.04	0.02	0.02
query8	0.07	0.06	0.06
query9	0.58	0.52	0.51
query10	0.56	0.56	0.56
query11	0.26	0.14	0.13
query12	0.27	0.14	0.14
query13	0.64	0.65	0.63
query14	1.03	1.00	1.02
query15	0.89	0.83	0.83
query16	0.40	0.43	0.39
query17	1.01	1.06	1.03
query18	0.25	0.23	0.22
query19	1.94	1.88	1.77
query20	0.02	0.02	0.02
query21	15.39	0.30	0.24
query22	4.98	0.10	0.09
query23	15.42	0.39	0.22
query24	2.46	0.47	0.32
query25	0.10	0.09	0.10
query26	0.19	0.18	0.18
query27	0.09	0.10	0.09
query28	3.78	1.32	1.16
query29	12.56	4.04	3.33
query30	0.33	0.13	0.12
query31	2.81	0.66	0.42
query32	3.23	0.61	0.50
query33	3.02	2.96	3.03
query34	16.85	5.22	4.60
query35	4.69	4.67	4.65
query36	0.62	0.50	0.49
query37	0.25	0.09	0.09
query38	0.20	0.07	0.06
query39	0.07	0.05	0.05
query40	0.22	0.18	0.16
query41	0.13	0.07	0.06
query42	0.08	0.04	0.04
query43	0.06	0.06	0.05
Total cold run time: 99.07 s
Total hot run time: 28.45 s

Signed-off-by: AntiTopQuark <AntiTopQuark1350@outlook.com>
@AntiTopQuark
Copy link
Author

run buildall

@doris-robot
Copy link

Cloud UT Coverage Report

Increment line coverage 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.83% (1730/2167)
Line Coverage 65.88% (30618/46476)
Region Coverage 66.62% (15279/22934)
Branch Coverage 56.93% (8121/14266)

@doris-robot
Copy link

TPC-H: Total hot run time: 35773 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 57a351fef3392c03d0f873edf845fbbc4999f99e, data reload: false

------ Round 1 ----------------------------------
q1	16487	4176	4059	4059
q2	2011	348	242	242
q3	9695	1323	725	725
q4	9824	851	319	319
q5	7444	2052	2011	2011
q6	185	164	136	136
q7	988	854	713	713
q8	9134	1386	1181	1181
q9	7007	5355	5351	5351
q10	6805	2404	1971	1971
q11	524	327	318	318
q12	648	717	571	571
q13	17368	3660	3019	3019
q14	288	290	280	280
q15	575	525	514	514
q16	676	693	641	641
q17	685	778	567	567
q18	7580	7004	8132	7004
q19	1089	1015	653	653
q20	433	393	268	268
q21	4564	4203	4287	4203
q22	1147	1088	1027	1027
Total cold run time: 105157 ms
Total hot run time: 35773 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4360	4263	4347	4263
q2	330	386	355	355
q3	2441	2870	2529	2529
q4	1394	1871	1458	1458
q5	4497	4486	4609	4486
q6	212	162	129	129
q7	1983	1852	1811	1811
q8	2715	2655	2520	2520
q9	7536	7513	7588	7513
q10	3113	3101	2648	2648
q11	569	509	470	470
q12	634	710	568	568
q13	3211	3626	3004	3004
q14	287	287	280	280
q15	541	494	486	486
q16	612	644	606	606
q17	1109	1425	1405	1405
q18	7329	7173	7036	7036
q19	847	822	834	822
q20	1888	1967	1902	1902
q21	4592	4347	4116	4116
q22	1110	1003	1011	1003
Total cold run time: 51310 ms
Total hot run time: 49410 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 178241 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 57a351fef3392c03d0f873edf845fbbc4999f99e, data reload: false

query5	5125	637	483	483
query6	330	226	215	215
query7	4206	460	275	275
query8	304	252	236	236
query9	8752	2551	2558	2551
query10	511	367	321	321
query11	15488	14982	14648	14648
query12	179	116	115	115
query13	1263	496	381	381
query14	5957	3270	3018	3018
query14_1	2918	2918	2912	2912
query15	220	197	182	182
query16	889	496	495	495
query17	1115	715	606	606
query18	2596	446	360	360
query19	244	232	215	215
query20	122	118	118	118
query21	221	138	118	118
query22	4048	4010	4004	4004
query23	16640	16307	15827	15827
query23_1	16000	16011	16093	16011
query24	7065	1738	1241	1241
query24_1	1236	1247	1265	1247
query25	581	503	470	470
query26	1252	266	159	159
query27	2757	487	310	310
query28	4320	2164	2144	2144
query29	808	580	484	484
query30	316	234	219	219
query31	818	703	596	596
query32	81	70	71	70
query33	550	344	302	302
query34	897	921	551	551
query35	800	825	728	728
query36	877	900	839	839
query37	137	93	126	93
query38	2857	2840	2828	2828
query39	756	761	717	717
query39_1	880	710	701	701
query40	228	137	118	118
query41	66	61	60	60
query42	107	103	104	103
query43	429	438	409	409
query44	1321	742	742	742
query45	195	189	185	185
query46	881	983	615	615
query47	1667	1704	1604	1604
query48	314	318	245	245
query49	639	439	360	360
query50	673	307	217	217
query51	3870	3875	3823	3823
query52	113	112	100	100
query53	325	351	298	298
query54	283	254	251	251
query55	87	76	74	74
query56	287	310	301	301
query57	1157	1133	1084	1084
query58	267	255	251	251
query59	2388	2445	2342	2342
query60	306	314	291	291
query61	166	193	161	161
query62	698	675	616	616
query63	328	292	296	292
query64	4730	1302	991	991
query65	4023	4019	3926	3926
query66	1429	457	318	318
query67	14947	14904	14665	14665
query68	8438	995	729	729
query69	498	339	308	308
query70	1083	1019	1003	1003
query71	389	321	292	292
query72	6069	4891	4998	4891
query73	667	563	308	308
query74	8565	8828	8685	8685
query75	3227	3150	2779	2779
query76	4192	1139	756	756
query77	772	399	295	295
query78	9409	9569	8885	8885
query79	1447	876	626	626
query80	695	655	554	554
query81	528	273	235	235
query82	208	136	105	105
query83	266	254	240	240
query84	264	114	96	96
query85	907	519	462	462
query86	382	295	275	275
query87	3066	3049	2926	2926
query88	4303	2291	2291	2291
query89	464	431	402	402
query90	2192	156	157	156
query91	175	168	149	149
query92	80	69	64	64
query93	1755	914	557	557
query94	467	303	275	275
query95	577	376	296	296
query96	591	484	212	212
query97	2243	2322	2259	2259
query98	209	194	190	190
query99	1300	1303	1227	1227
Total cold run time: 261675 ms
Total hot run time: 178241 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.31 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 57a351fef3392c03d0f873edf845fbbc4999f99e, data reload: false

query1	0.05	0.04	0.04
query2	0.13	0.06	0.06
query3	0.31	0.08	0.08
query4	1.61	0.10	0.10
query5	0.26	0.24	0.24
query6	1.17	0.66	0.66
query7	0.03	0.02	0.03
query8	0.07	0.06	0.05
query9	0.59	0.50	0.50
query10	0.58	0.56	0.56
query11	0.25	0.14	0.14
query12	0.26	0.15	0.14
query13	0.63	0.62	0.62
query14	1.00	1.00	1.01
query15	0.89	0.85	0.81
query16	0.39	0.39	0.37
query17	1.05	0.98	0.98
query18	0.24	0.22	0.23
query19	1.96	1.84	1.76
query20	0.02	0.01	0.02
query21	15.43	0.27	0.24
query22	4.96	0.10	0.09
query23	15.38	0.39	0.22
query24	2.42	0.48	0.31
query25	0.10	0.09	0.10
query26	0.19	0.18	0.17
query27	0.10	0.10	0.08
query28	3.76	1.35	1.17
query29	12.58	4.14	3.29
query30	0.32	0.13	0.12
query31	2.83	0.66	0.43
query32	3.24	0.62	0.49
query33	3.03	3.06	3.12
query34	16.56	5.26	4.56
query35	4.66	4.62	4.66
query36	0.63	0.51	0.49
query37	0.23	0.09	0.09
query38	0.20	0.05	0.05
query39	0.07	0.05	0.05
query40	0.20	0.18	0.17
query41	0.12	0.06	0.06
query42	0.08	0.05	0.05
query43	0.06	0.06	0.05
Total cold run time: 98.64 s
Total hot run time: 28.31 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 100.00% (21/21) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.47% (18846/35244)
Line Coverage 39.21% (174288/444463)
Region Coverage 33.84% (134922/398716)
Branch Coverage 34.80% (58080/166912)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (21/21) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.41% (25010/34541)
Line Coverage 59.19% (262777/443958)
Region Coverage 54.19% (218614/403432)
Branch Coverage 55.69% (93444/167785)

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (6/9) 🎉
Increment coverage report
Complete coverage report

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for the FIRST aggregate type to Apache Doris. The FIRST aggregate function keeps the first value inserted for a column, complementing the existing REPLACE aggregate type which keeps the last value. This is implemented as part of the "replace family" of aggregation types.

Key Changes:

  • Added FIRST as a new aggregate type across all system layers (Thrift definitions, Java enums, C++ enums)
  • Implemented FIRST aggregate function with separate reader and load phases following existing patterns
  • Added FIRST to the "replace family" alongside REPLACE and REPLACE_IF_NOT_NULL for consistent handling in schema operations

Reviewed changes

Copilot reviewed 17 out of 17 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
gensrc/thrift/Types.thrift Added FIRST = 9 to TAggregationType enum
fe/fe-core/src/main/java/org/apache/doris/catalog/AggregateType.java Added FIRST enum value, registered in aggTypeMap, added to isReplaceFamily(), compatibility map, and toThrift() conversion
fe/fe-core/src/main/antlr4/org/apache/doris/nereids/DorisParser.g4 Added FIRST to aggTypeDef grammar rule
fe/fe-core/src/main/java/org/apache/doris/alter/SchemaChangeHandler.java Added FIRST checks to prevent dropping key columns when FIRST value columns exist
be/src/olap/olap_common.h Added OLAP_FIELD_AGGREGATION_FIRST = 11 to FieldAggregationMethod enum
be/src/olap/tablet_schema.cpp Added string conversion support for FIRST aggregation type
be/src/olap/field.h Added FIRST case to field factory aggregation method handling
be/src/olap/memtable.cpp Added logic to select first_load function for FIRST aggregation in memtable
be/src/vec/aggregate_functions/aggregate_function_reader.h Added function declaration for register_aggregate_function_first_reader_load
be/src/vec/aggregate_functions/aggregate_function_reader.cpp Implemented FIRST aggregate function registration for both reader and load phases
be/src/vec/aggregate_functions/aggregate_function_simple_factory.cpp Registered FIRST reader/load functions in factory
regression-test/suites/query_p0/aggregate/aggregate.groovy Added end-to-end test validating FIRST keeps first value while REPLACE keeps last value
fe/fe-core/src/test/java/org/apache/doris/catalog/AggregateTypeTest.java Added unit tests for FIRST type conversions and replace family membership
fe/fe-core/src/test/java/org/apache/doris/nereids/trees/plans/CreateTableCommandTest.java Added test for creating aggregate table with FIRST column
fe/fe-core/src/test/java/org/apache/doris/alter/SchemaChangeHandlerTest.java Added test verifying key column drop is forbidden when FIRST columns exist
be/test/vec/aggregate_functions/aggregate_function_first_test.cpp Added unit tests for FIRST aggregate function factory and load semantics
be/test/olap/tablet_schema_test.cpp Added tests for FIRST aggregation type string conversions and thrift serialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 349 to +351
} else if (AggregateType.REPLACE == column.getAggregationType()
|| AggregateType.REPLACE_IF_NOT_NULL == column.getAggregationType()) {
|| AggregateType.REPLACE_IF_NOT_NULL == column.getAggregationType()
|| AggregateType.FIRST == column.getAggregationType()) {
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using the isReplaceFamily() method instead of explicitly checking each aggregate type. This would make the code more maintainable and consistent with how the replace family is identified elsewhere. Replace the condition with: column.getAggregationType().isReplaceFamily()

Copilot uses AI. Check for mistakes.
Comment on lines 369 to +371
} else if (AggregateType.REPLACE == column.getAggregationType()
|| AggregateType.REPLACE_IF_NOT_NULL == column.getAggregationType()) {
|| AggregateType.REPLACE_IF_NOT_NULL == column.getAggregationType()
|| AggregateType.FIRST == column.getAggregationType()) {
Copy link

Copilot AI Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using the isReplaceFamily() method instead of explicitly checking each aggregate type. This would make the code more maintainable and consistent with how the replace family is identified elsewhere. Replace the condition with: column.getAggregationType().isReplaceFamily()

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for adding a new aggregate type, we need comprehensive test coverage. like schema change, insert, streamload and so on. maybe you need to reference other testcases about aggregate table. then make sure FIRST is coveraged well

@AntiTopQuark
Copy link
Author

AntiTopQuark commented Dec 18, 2025

Hi, @zclllyybb

Naming the syntax directly as FIRST conflicts with the existing MySQL-style syntax
ALTER TABLE ${partition_table_name} MODIFY COLUMN name STRING FIRST,
which moves a column to the first position in the table.

Would it be possible to implement the keyword as FIRST_IF_NOT_NULL instead?

@zclllyybb
Copy link
Contributor

Hi, @zclllyybb

Naming the syntax directly as FIRST conflicts with the existing MySQL-style syntax ALTER TABLE ${partition_table_name} MODIFY COLUMN name STRING FIRST, which moves a column to the first position in the table.

Would it be possible to implement the keyword as FIRST_IF_NOT_NULL instead?

It looks fine for me~

@zclllyybb
Copy link
Contributor

@zclllyybb zclllyybb closed this Dec 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] The “First” aggregation type of the aggregation table

4 participants