Validate coerce int96 config 17498#20253
Open
AlyAbdelmoneim wants to merge 5 commits intoapache:mainfrom
Open
Conversation
- Introduce DFTimeUnit enum for INT96 timestamp coercion - Implement FromStr, Display, and ConfigField for validation and SET support - Add conversions to/from arrow::TimeUnit for engine integration - Replace old parse_coerce_int96_string usage with DFTimeUnit - Ensures invalid config values are rejected early
Contributor
Author
Jefffrey
reviewed
Feb 10, 2026
| let time_units_and_expected = vec![ | ||
| ( | ||
| None, // Same as "ns" time_unit | ||
| None, // default: None = "ns" |
Contributor
There was a problem hiding this comment.
If we treat None as defaulting to ns then perhaps we should remove the Option and just make it directly a DFTimeUnit?
Contributor
Author
There was a problem hiding this comment.
yes I will do that
Contributor
Author
There was a problem hiding this comment.
should I change CoerceInt96Opt too ?
Contributor
There was a problem hiding this comment.
Hmm perhaps I was mislead by these test comments; I'm not certain that None does default to ns, especially considering theres some test failures now 🤔
Might need more investigation before proceeding with this change (of removing the Option)
Contributor
|
@mbutrovich FYI |
The `datafusion.execution.parquet.coerce_int96` configuration option is now a `DFTimeUnit` directly, instead of `Option<DFTimeUnit>`. Its default value is `DFTimeUnit::NS` (Nanosecond). The `DFTimeUnit` enum was simplified by removing redundant variants (e.g., `Nanosecond`, `Microsecond`), retaining only the abbreviated forms (`NS`, `US`, `MS`, `S`). This standardizes `coerce_int96` to always have a value and cleans up the underlying `DFTimeUnit` enum. Related code, including protobuf serialization/deserialization and accessor methods, has been updated to reflect this non-optional behavior and simplified enum structure.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
Currently, invalid
coerce_int96values (e.g., "invalid") are accepted atSETtime and only fail later when reading Parquet files. This PR adds early validation so invalid values are rejected immediately with clear error messages, following the same pattern asDFWriterVersion.What changes are included in this PR?
DFTimeUnitenum withFromStr,Display,ConfigFieldimplementationsParquetOptions.coerce_int96fromString/Option<String>toOption<DFTimeUnit>parse_coerce_int96_stringas a separate runtime parser; validation now happens at config timeParquetFormat::with_coerce_int96and related usage to acceptOption<DFTimeUnit>Are these changes tested?
Yes. Added
test_parquet_coerce_int96_validationthat verifies:Are there any user-facing changes?
Not exactly. Invalid
coerce_int96values now error immediately when set viaSETcommand or proto deserialization, instead of failing later during Parquet file reading. This provides better error messages and earlier feedback.