Geometry Encodings for Parquet Files

This section compares 3 geometry encodings:

All encoding mentioned in this section utilizes columnar statistics of parquet to form bounding boxes of geometries, and thus supports row-group and page level data skipping for spatial range query.

Geometry Encodings Design

WKB with Bounding Box (a variant of Phase 1 WKB)

GeoLake has an implementation of WKB with bounding box encoding. See this for the definition of geometry group type.

<repetition> group GeometryWithBBOX {
  required binary wkb;
  optional double min_x;
  optional double min_y;
  optional double max_x;
  optional double max_y;
}

The min_x, min_y, max_x, max_y fields are defined as optional, and these values could be missing when wkb represents an empty geometry.

GeoParquet spec added a similar encoding, the difference is that bbox columns are not stored together with geometry column.

Pros

Cons

SpatialParquet (Phase 2 candidate)

SpatialParquet is described in this paper. The definition of geometry group type is as follows.