[SPARK-56813][DOCS] Refine the documentation for geospatial types and functions#55790
[SPARK-56813][DOCS] Refine the documentation for geospatial types and functions#55790uros-db wants to merge 4 commits into
Conversation
uros-db
left a comment
There was a problem hiding this comment.
@cloud-fan Please review.
cloud-fan
left a comment
There was a problem hiding this comment.
Summary
This PR is a documentation refresh that aligns the geospatial reference pages with the code now that spark.sql.geospatial.enabled defaults to true in 4.2. It does four things:
- Adds a 4.1 → 4.2 bullet in the migration guide noting geospatial types and
ST_*are enabled (see comment below — I'd suggest dropping this one). - Drops the unparameterized type syntax (
GeometryType()/GEOMETRY) from the PySpark row and the SQL aliases row ofsql-ref-datatypes.md. Verified againstpython/pyspark/sql/types.py:660(Python ctor requiressrid) andSqlBaseParser.g4:1482-1483(SQL grammar requires(srid | ANY)). Replaces the old, misleading "Default SRID 4326" line on the overview rows ofsql-ref-geospatial-types.mdwith type-specific accuracy (GEOMETRY accepts any registry SRID including 0; GEOGRAPHY only geographic SRIDs). - Refreshes the ST function summary table in
sql-ref-geospatial-types.mdto use[, optional]shorthand forST_AsBinaryandST_GeomFromWKB(matching the_FUNC_(geo[, endianness])/_FUNC_(wkb[, srid])shapes in theExpressionDescriptions), names the concrete error classes (GEO_ENCODER_SRID_MISMATCH_ERROR,ST_INVALID_SRID_VALUE), and adds an XDR example. - Surfaces
ST_*in the generated function index by adding a### Geospatial ST Functionssection tosql-ref-functions-builtin.mdand adding"st_funcs"to the doc generator's groups set ingen-sql-functions-docs.py. Section structure follows the existing### Foo\n{% include_api_gen ... %}pattern, and the doc gen change is the matching half.
Facts I verified against the code
ST_AsBinary(geo[, endianness]),'NDR'default —stExpressions.scala:55, :86-94(DEFAULT_WKB_ENDIANNESS = "NDR"plus the secondarydef this(geo)ctor).ST_GeomFromWKB(wkb)returns SRID 0 —stExpressions.scala:166-200("Default is 0").ST_GeogFromWKBreturns SRID 4326 —stExpressions.scala:143-156.GEO_ENCODER_SRID_MISMATCH_ERRORon mismatched insert —GeometryType.scala:137,GeographyType.scala:140.ST_INVALID_SRID_VALUEon invalid SRID — same files.ST_SetSrid: "new SRID must be valid for the value's type" —STUtils.java:169-191, both Geography and Geometry branches call the type-specificisSridSupported.- Unparameterized
GEOMETRY/GEOGRAPHYrejected in SQL —SqlBaseParser.g4:1482-1483requiresLEFT_PAREN (srid | ANY) RIGHT_PAREN.GEOMETRY/GEOGRAPHYare also innonReserved/ansiNonReserved(:2048-2049, :2456-2457), so they don't claim any identifier namespace. - Five ST functions registered, matching the table exactly —
FunctionRegistry.scala:985-989(st_asbinary,st_geogfromwkb,st_geomfromwkb,st_srid,st_setsrid). - "Enabled since 4.2" claim — SPARK-56771 (commit
c6d289d5423) flippedspark.sql.geospatial.enableddefault totrue. Theversion("4.1.0")on the config refers to when the gate was introduced, not when it defaulted on.
General comments
Migration-guide bullet may not belong. See the inline comment on sql-migration-guide.md. Briefly: this is a purely additive feature (non-reserved keywords, new registry entries), and the PR description itself answers "Does this PR introduce any user-facing change? No." — which is the right read and is inconsistent with adding a migration entry. The other 4.1→4.2 bullets in that section all describe behavior changes for existing workloads (default-on order-independent checksums, CustomTaskMetric.mergeWith connector contract, Derby deprecation). Suggest moving this to release notes instead.
Scope question on sql-ref-datatypes.md Scala row. The PR title says "drop unparameterized type syntax from the SQL and PySpark" and the diff does that for those two rows. The Scala row at line 182 (GeometryType or GeometryType(*srid*)) is untouched. Technically accurate — bare GeometryType is the companion object, which is a SpatialType/DataType and resolves to the mixed-SRID default — but now reads inconsistently with the new Python row's "srid is required" note. Probably out of scope as you've drawn it; flagging in case you'd like a one-line clarification on the Scala row (e.g. "GeometryType resolves to the mixed-SRID default; use GeometryType(srid) for a fixed SRID").
Co-authored-by: Wenchen Fan <cloud0fan@gmail.com>
|
@cloud-fan Thanks, all comments addressed - PTAL |
|
thanks, merging to master/4.x/4.2! |
… functions ### What changes were proposed in this pull request? Tighten the geospatial documentation: - `sql-ref-geospatial-types.md`: update the documentation for supported ST functions. - `sql-ref-datatypes.md`: drop unparameterized type syntax from the SQL and PySpark. - `sql-ref-functions-builtin.md`: surface `st_funcs` group as **Geospatial ST Functions**. - `sql-migration-guide.md`: note that geospatial types and ST functions are enabled since 4.2. - etc. ### Why are the changes needed? Fix gaps and accuracy for geospatial documentation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests suffice for docs only changes. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7. Closes #55790 from uros-db/geo-docs-refine. Authored-by: Uros Bojanic <uros.bojanic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e648859) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
… functions ### What changes were proposed in this pull request? Tighten the geospatial documentation: - `sql-ref-geospatial-types.md`: update the documentation for supported ST functions. - `sql-ref-datatypes.md`: drop unparameterized type syntax from the SQL and PySpark. - `sql-ref-functions-builtin.md`: surface `st_funcs` group as **Geospatial ST Functions**. - `sql-migration-guide.md`: note that geospatial types and ST functions are enabled since 4.2. - etc. ### Why are the changes needed? Fix gaps and accuracy for geospatial documentation. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Existing tests suffice for docs only changes. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.7. Closes #55790 from uros-db/geo-docs-refine. Authored-by: Uros Bojanic <uros.bojanic@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com> (cherry picked from commit e648859) Signed-off-by: Wenchen Fan <wenchen@databricks.com>
What changes were proposed in this pull request?
Tighten the geospatial documentation:
sql-ref-geospatial-types.md: update the documentation for supported ST functions.sql-ref-datatypes.md: drop unparameterized type syntax from the SQL and PySpark.sql-ref-functions-builtin.md: surfacest_funcsgroup as Geospatial ST Functions.sql-migration-guide.md: note that geospatial types and ST functions are enabled since 4.2.Why are the changes needed?
Fix gaps and accuracy for geospatial documentation.
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Existing tests suffice for docs only changes.
Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7.