This section compares 3 geometry encodings:
All encoding mentioned in this section utilizes columnar statistics of parquet to form bounding boxes of geometries, and thus supports row-group and page level data skipping for spatial range query.
GeoLake has an implementation of WKB with bounding box encoding. See this for the definition of geometry group type.
<repetition> group GeometryWithBBOX {
required binary wkb;
optional double min_x;
optional double min_y;
optional double max_x;
optional double max_y;
}
The min_x, min_y, max_x, max_y fields are defined as optional, and these values could be missing when wkb represents an empty geometry.
GeoParquet spec added a similar encoding, the difference is that bbox columns are not stored together with geometry column.
Pros
Cons
SpatialParquet is described in this paper. The definition of geometry group type is as follows.