Skip to content

[SPARK-56781][PYTHON] Refactor SQL_GROUPED_AGG_PANDAS_UDF#55808

Draft
Yicong-Huang wants to merge 1 commit into
apache:masterfrom
Yicong-Huang:SPARK-56781
Draft

[SPARK-56781][PYTHON] Refactor SQL_GROUPED_AGG_PANDAS_UDF#55808
Yicong-Huang wants to merge 1 commit into
apache:masterfrom
Yicong-Huang:SPARK-56781

Conversation

@Yicong-Huang
Copy link
Copy Markdown
Contributor

@Yicong-Huang Yicong-Huang commented May 11, 2026

What changes were proposed in this pull request?

Refactor SQL_GROUPED_AGG_PANDAS_UDF to use ArrowStreamGroupSerializer as a pure I/O layer, moving the per-group pandas conversion and UDF invocation into read_udfs() in worker.py. The custom ArrowStreamAggPandasUDFSerializer is no longer used for this eval type (still used by SQL_GROUPED_AGG_PANDAS_ITER_UDF and SQL_WINDOW_AGG_PANDAS_UDF).

Why are the changes needed?

Part of SPARK-55388.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests. No behavior change.

ASV benchmark comparison (master vs this branch, GroupedAggPandasUDFTimeBench / GroupedAggPandasUDFPeakmemBench, -a repeat=3) Result pending.

Was this patch authored or co-authored using generative AI tooling?

No.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant