Description
Support for geospatial vector data is growing among the analytics community. For example
- GeoParquet standard, a standard way for Parquet files to store vector data (
MULTIPOLYGON, LINESTRING, etc.) as well as other feature components and store them as-such instead
- GeoArrow standard, which formalizes a set of extension types on top of Apache Arrow
- OGC Simple Feature Access provides a vast catalog of geospatial function types
- Iceberg V3 adds first-party support for the
GEOMETRY type
Goals
- Any data stored in a GeoParquet file should be convertible to a Vortex file
- We should support exposing geospatial schemas to our query engines that support it. Currently that is DuckDB and DataFusion. Iceberg v3 also supports it but that is in a separate repo.
- Not an explicit goal for this epic, but we should perform at least on par with other formats in common geospatial benchmarks. Currently the one that I've found is Apache Sedona's SpatialBench
Components
To complete this will require making Arrow execution pluggable.
Out of scope for now
- Support for union types and singular
geometry unified extension type
- Raster data support
- Interop with SedonaDB (?)
Questions
- Most systems seem to use WKB, should we just use that, but then have compressed codecs that shred it?
- E.g. DuckDB vectors require receiving Binary WKB data for execution
Description
Support for geospatial vector data is growing among the analytics community. For example
MULTIPOLYGON,LINESTRING, etc.) as well as other feature components and store them as-such insteadGEOMETRYtypeGoals
Components
To complete this will require making Arrow execution pluggable.
Out of scope for now
geometryunified extension typeQuestions