Skip to content

Tracking Issue: Geospatial Vector types #7686

@a10y

Description

@a10y

Description

Support for geospatial vector data is growing among the analytics community. For example

  • GeoParquet standard, a standard way for Parquet files to store vector data (MULTIPOLYGON, LINESTRING, etc.) as well as other feature components and store them as-such instead
  • GeoArrow standard, which formalizes a set of extension types on top of Apache Arrow
  • OGC Simple Feature Access provides a vast catalog of geospatial function types
  • Iceberg V3 adds first-party support for the GEOMETRY type

Goals

  • Any data stored in a GeoParquet file should be convertible to a Vortex file
  • We should support exposing geospatial schemas to our query engines that support it. Currently that is DuckDB and DataFusion. Iceberg v3 also supports it but that is in a separate repo.
  • Not an explicit goal for this epic, but we should perform at least on par with other formats in common geospatial benchmarks. Currently the one that I've found is Apache Sedona's SpatialBench

Components

To complete this will require making Arrow execution pluggable.

Out of scope for now

  • Support for union types and singular geometry unified extension type
  • Raster data support
  • Interop with SedonaDB (?)

Questions

  • Most systems seem to use WKB, should we just use that, but then have compressed codecs that shred it?
  • E.g. DuckDB vectors require receiving Binary WKB data for execution

Metadata

Metadata

Assignees

Labels

tracking-issueShared implementation context for work likely to span multiple PRs.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions