Skip to content

[SPARK-56813][DOCS] Refine the documentation for geospatial types and functions#55790

Closed
uros-db wants to merge 4 commits into
apache:masterfrom
uros-db:geo-docs-refine
Closed

[SPARK-56813][DOCS] Refine the documentation for geospatial types and functions#55790
uros-db wants to merge 4 commits into
apache:masterfrom
uros-db:geo-docs-refine

Conversation

@uros-db
Copy link
Copy Markdown
Contributor

@uros-db uros-db commented May 10, 2026

What changes were proposed in this pull request?

Tighten the geospatial documentation:

  • sql-ref-geospatial-types.md: update the documentation for supported ST functions.
  • sql-ref-datatypes.md: drop unparameterized type syntax from the SQL and PySpark.
  • sql-ref-functions-builtin.md: surface st_funcs group as Geospatial ST Functions.
  • sql-migration-guide.md: note that geospatial types and ST functions are enabled since 4.2.
  • etc.

Why are the changes needed?

Fix gaps and accuracy for geospatial documentation.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing tests suffice for docs only changes.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Opus 4.7.

Copy link
Copy Markdown
Contributor Author

@uros-db uros-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cloud-fan Please review.

Copy link
Copy Markdown
Contributor

@cloud-fan cloud-fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR is a documentation refresh that aligns the geospatial reference pages with the code now that spark.sql.geospatial.enabled defaults to true in 4.2. It does four things:

  • Adds a 4.1 → 4.2 bullet in the migration guide noting geospatial types and ST_* are enabled (see comment below — I'd suggest dropping this one).
  • Drops the unparameterized type syntax (GeometryType() / GEOMETRY) from the PySpark row and the SQL aliases row of sql-ref-datatypes.md. Verified against python/pyspark/sql/types.py:660 (Python ctor requires srid) and SqlBaseParser.g4:1482-1483 (SQL grammar requires (srid | ANY)). Replaces the old, misleading "Default SRID 4326" line on the overview rows of sql-ref-geospatial-types.md with type-specific accuracy (GEOMETRY accepts any registry SRID including 0; GEOGRAPHY only geographic SRIDs).
  • Refreshes the ST function summary table in sql-ref-geospatial-types.md to use [, optional] shorthand for ST_AsBinary and ST_GeomFromWKB (matching the _FUNC_(geo[, endianness]) / _FUNC_(wkb[, srid]) shapes in the ExpressionDescriptions), names the concrete error classes (GEO_ENCODER_SRID_MISMATCH_ERROR, ST_INVALID_SRID_VALUE), and adds an XDR example.
  • Surfaces ST_* in the generated function index by adding a ### Geospatial ST Functions section to sql-ref-functions-builtin.md and adding "st_funcs" to the doc generator's groups set in gen-sql-functions-docs.py. Section structure follows the existing ### Foo\n{% include_api_gen ... %} pattern, and the doc gen change is the matching half.

Facts I verified against the code

  • ST_AsBinary(geo[, endianness]), 'NDR' default — stExpressions.scala:55, :86-94 (DEFAULT_WKB_ENDIANNESS = "NDR" plus the secondary def this(geo) ctor).
  • ST_GeomFromWKB(wkb) returns SRID 0 — stExpressions.scala:166-200 ("Default is 0").
  • ST_GeogFromWKB returns SRID 4326 — stExpressions.scala:143-156.
  • GEO_ENCODER_SRID_MISMATCH_ERROR on mismatched insert — GeometryType.scala:137, GeographyType.scala:140.
  • ST_INVALID_SRID_VALUE on invalid SRID — same files.
  • ST_SetSrid: "new SRID must be valid for the value's type" — STUtils.java:169-191, both Geography and Geometry branches call the type-specific isSridSupported.
  • Unparameterized GEOMETRY / GEOGRAPHY rejected in SQL — SqlBaseParser.g4:1482-1483 requires LEFT_PAREN (srid | ANY) RIGHT_PAREN. GEOMETRY/GEOGRAPHY are also in nonReserved / ansiNonReserved (:2048-2049, :2456-2457), so they don't claim any identifier namespace.
  • Five ST functions registered, matching the table exactly — FunctionRegistry.scala:985-989 (st_asbinary, st_geogfromwkb, st_geomfromwkb, st_srid, st_setsrid).
  • "Enabled since 4.2" claim — SPARK-56771 (commit c6d289d5423) flipped spark.sql.geospatial.enabled default to true. The version("4.1.0") on the config refers to when the gate was introduced, not when it defaulted on.

General comments

Migration-guide bullet may not belong. See the inline comment on sql-migration-guide.md. Briefly: this is a purely additive feature (non-reserved keywords, new registry entries), and the PR description itself answers "Does this PR introduce any user-facing change? No." — which is the right read and is inconsistent with adding a migration entry. The other 4.1→4.2 bullets in that section all describe behavior changes for existing workloads (default-on order-independent checksums, CustomTaskMetric.mergeWith connector contract, Derby deprecation). Suggest moving this to release notes instead.

Scope question on sql-ref-datatypes.md Scala row. The PR title says "drop unparameterized type syntax from the SQL and PySpark" and the diff does that for those two rows. The Scala row at line 182 (GeometryType or GeometryType(*srid*)) is untouched. Technically accurate — bare GeometryType is the companion object, which is a SpatialType/DataType and resolves to the mixed-SRID default — but now reads inconsistently with the new Python row's "srid is required" note. Probably out of scope as you've drawn it; flagging in case you'd like a one-line clarification on the Scala row (e.g. "GeometryType resolves to the mixed-SRID default; use GeometryType(srid) for a fixed SRID").

Comment thread docs/sql-migration-guide.md Outdated
Comment thread docs/sql-ref-datatypes.md Outdated
uros-db and others added 3 commits May 12, 2026 15:30
@uros-db uros-db requested a review from cloud-fan May 12, 2026 19:38
@uros-db
Copy link
Copy Markdown
Contributor Author

uros-db commented May 12, 2026

@cloud-fan Thanks, all comments addressed - PTAL

@cloud-fan
Copy link
Copy Markdown
Contributor

thanks, merging to master/4.x/4.2!

@cloud-fan cloud-fan closed this in e648859 May 13, 2026
cloud-fan pushed a commit that referenced this pull request May 13, 2026
… functions

### What changes were proposed in this pull request?
Tighten the geospatial documentation:
- `sql-ref-geospatial-types.md`: update the documentation for supported ST functions.
- `sql-ref-datatypes.md`: drop unparameterized type syntax from the SQL and PySpark.
- `sql-ref-functions-builtin.md`: surface `st_funcs` group as **Geospatial ST Functions**.
- `sql-migration-guide.md`: note that geospatial types and ST functions are enabled since 4.2.
- etc.

### Why are the changes needed?
Fix gaps and accuracy for geospatial documentation.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests suffice for docs only changes.

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7.

Closes #55790 from uros-db/geo-docs-refine.

Authored-by: Uros Bojanic <uros.bojanic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e648859)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request May 13, 2026
… functions

### What changes were proposed in this pull request?
Tighten the geospatial documentation:
- `sql-ref-geospatial-types.md`: update the documentation for supported ST functions.
- `sql-ref-datatypes.md`: drop unparameterized type syntax from the SQL and PySpark.
- `sql-ref-functions-builtin.md`: surface `st_funcs` group as **Geospatial ST Functions**.
- `sql-migration-guide.md`: note that geospatial types and ST functions are enabled since 4.2.
- etc.

### Why are the changes needed?
Fix gaps and accuracy for geospatial documentation.

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Existing tests suffice for docs only changes.

### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Opus 4.7.

Closes #55790 from uros-db/geo-docs-refine.

Authored-by: Uros Bojanic <uros.bojanic@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit e648859)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants