[SPARK-56781][PYTHON] Refactor SQL_GROUPED_AGG_PANDAS_UDF by Yicong-Huang · Pull Request #55808 · apache/spark

Yicong-Huang · 2026-05-11T21:03:13Z

What changes were proposed in this pull request?

Refactor SQL_GROUPED_AGG_PANDAS_UDF to use ArrowStreamGroupSerializer as a pure I/O layer, moving the per-group pandas conversion and UDF invocation into read_udfs() in worker.py. The custom ArrowStreamAggPandasUDFSerializer is no longer used for this eval type (still used by SQL_GROUPED_AGG_PANDAS_ITER_UDF and SQL_WINDOW_AGG_PANDAS_UDF).

Why are the changes needed?

Part of SPARK-55388.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests. No behavior change.

ASV benchmark comparison (master vs this branch, GroupedAggPandasUDFTimeBench / GroupedAggPandasUDFPeakmemBench, -a repeat=3) Result pending.

Was this patch authored or co-authored using generative AI tooling?

No.

refactor: move SQL_GROUPED_AGG_PANDAS_UDF logic into read_udfs

45e9c97

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56781][PYTHON] Refactor SQL_GROUPED_AGG_PANDAS_UDF#55808

[SPARK-56781][PYTHON] Refactor SQL_GROUPED_AGG_PANDAS_UDF#55808
Yicong-Huang wants to merge 1 commit into
apache:masterfrom
Yicong-Huang:SPARK-56781

Yicong-Huang commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Yicong-Huang commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Yicong-Huang commented May 11, 2026 •

edited

Loading