[SPARK-56819][SQL] Add option to trim CHAR trailing spaces on read by llphxd · Pull Request #55820 · apache/spark

llphxd · 2026-05-12T11:57:48Z

What changes were proposed in this pull request?

This PR adds a new SQL configuration, spark.sql.charTrimTrailingSpacesOnRead, to trim trailing spaces from CHAR(N) columns and fields when reading table data.
The new configuration is disabled by default, so the existing Spark behavior is preserved. When it is enabled, it takes precedence over spark.sql.readSideCharPadding.
This is intended to provide an opt-in compatibility mode for systems such as MySQL, where CHAR values are commonly returned without trailing spaces unless PAD_CHAR_TO_FULL_LENGTH is enabled.

Why are the changes needed?

Spark currently enforces fixed-length CHAR(N) semantics by padding CHAR values on write, and by applying read-side padding when spark.sql.readSideCharPadding is enabled.
I tested this behavior across several Spark versions with MySQL tables. In Spark 3.3.1 and Spark 3.4.4, MySQL CHAR and VARCHAR columns were simply treated as Spark STRING, so trailing-space behavior was closer to the old string-based behavior. In Spark 3.5.2 and Spark 4.0.1, Spark maps MySQL character types to more standard and stricter Spark CHAR types, which can expose behavior differences for CHAR columns compared with older Spark versions.
This makes migration or upgrade harder for workloads that rely on the previous string-like behavior or on MySQL's default CHAR retrieval behavior, where trailing spaces are removed on read. Users may otherwise need to wrap many CHAR columns with rtrim() manually in queries.
This PR provides an opt-in configuration to make this behavior easier to control without changing Spark's default semantics.

Does this PR introduce any user-facing change?

Yes.
This PR adds a new SQL configuration:

spark.sql.charTrimTrailingSpacesOnRead

The default value is false, so existing behavior is unchanged.

When set to true, Spark trims trailing spaces from CHAR(N) columns and fields when reading table data. The option does not affect VARCHAR or STRING, and it does not change write-side CHAR/VARCHAR length checks.

Example:

SET spark.sql.charTrimTrailingSpacesOnRead=true;

CREATE TABLE t (c CHAR(4), v VARCHAR(4), s STRING) USING parquet;
INSERT INTO t VALUES ('12', '12 ', '12 ');

SELECT c, length(c), v, length(v), s, length(s) FROM t;

With the new configuration enabled, the CHAR(4) value is returned without trailing spaces, while VARCHAR and STRING remain unchanged.

How was this patch tested?

Added test coverage in CharVarcharTestSuite for trimming trailing spaces from CHAR columns and nested CHAR fields on read, while keeping VARCHAR and STRING unchanged.

Tested with:
./dev/scalastyle
build/sbt "sql/testOnly *CharVarcharTestSuite"

Was this patch authored or co-authored using generative AI tooling?

Assisted by ChatGPT-5.5

llphxd · 2026-05-12T12:02:22Z

jira is ready: SPARK-56819

llphxd · 2026-05-12T12:09:52Z

One possible question is why this new option is needed when spark.sql.legacy.charVarcharAsString already exists.

I think the two options serve different purposes. spark.sql.legacy.charVarcharAsString disables Spark's CHAR/VARCHAR type semantics broadly by treating CHAR/VARCHAR as STRING. This restores older Spark behavior, but it also disables length checks and CHAR padding semantics, so it is a coarse-grained legacy compatibility switch.

The proposed option is narrower. It only changes the read-side representation of CHAR values by trimming trailing spaces when explicitly enabled. It does not affect VARCHAR or STRING, and it does not disable write-side CHAR/VARCHAR length checks. This allows users to keep Spark's stricter CHAR/VARCHAR type handling while opting into MySQL-compatible CHAR retrieval behavior.

This is useful for migration/upgrade scenarios where users want to preserve standard CHAR/VARCHAR validation in Spark, but need the returned CHAR values to match MySQL's default behavior or previous string-like query results more closely.

SPARK-56819: Add option to trim CHAR trailing spaces on read

27237f4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56819][SQL] Add option to trim CHAR trailing spaces on read#55820

[SPARK-56819][SQL] Add option to trim CHAR trailing spaces on read#55820
llphxd wants to merge 1 commit into
apache:masterfrom
llphxd:SPARK-56819-char-trim-on-read

llphxd commented May 12, 2026 •

edited

Loading

Uh oh!

llphxd commented May 12, 2026

Uh oh!

llphxd commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

llphxd commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

llphxd commented May 12, 2026

Uh oh!

llphxd commented May 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

llphxd commented May 12, 2026 •

edited

Loading