CoursesGIS Basics — A Complete Introduction6.6 Cloud-Native Vector Formats — FlatGeobuf, GeoParquet, PMTiles
Module 6: Spatial Data Formats

6.6 Cloud-Native Vector Formats — FlatGeobuf, GeoParquet, PMTiles

The next-generation vector formats built for object storage, range reads, and analytics at scale.

Lesson 32 of 100·20 min read

Key takeaways

  • FlatGeobuf is a streamable binary format that replaces GeoJSON for large datasets.
  • GeoParquet is columnar, cloud-native, and integrates with the modern analytics stack.
  • PMTiles packages a tile pyramid into a single HTTP-range-readable file — no tile server required.

Introduction

Web-era GIS created demand for formats that stream from object storage without a running server. This lesson covers the three most important: FlatGeobuf, GeoParquet, and PMTiles. Together they cover vector data at every scale from web delivery to analytic warehouses.

FlatGeobuf

FlatGeobuf (.fgb) is a binary vector format built on Google's FlatBuffers serialisation library. Key properties:

  • Streamable — read features one at a time without loading the whole file.
  • Spatially indexed — Packed Hilbert R-tree at the front; clients can find features in a bounding box quickly.
  • Single file — no sidecars.
  • HTTP range reads — browsers can query remote FlatGeobuf files efficiently.
  • Mutable, but append-only is the typical workflow.

Reading in Python:

Python
1import geopandas as gpd
2[object Object]
3[object Object]
4

JavaScript:

JavaScript
1import { deserialize } from 'flatgeobuf/dist/geojson.js';
2const url = 'https://example.com/buildings.fgb';
3for await (const f of deserialize(url, bbox)) {
4  // f is a GeoJSON Feature
5}

When to use: medium-to-large vector data, especially when serving from object storage without a tile pipeline.

GeoParquet

Parquet is the columnar data format dominant in modern analytics (Spark, DuckDB, Snowflake, BigQuery). GeoParquet (v1.0 in 2024) adds geometry encoding via WKB inside a geometry column plus schema metadata per column.

Advantages:

  • Columnar — read only the columns you need.
  • Strong compression — typically 5–10× smaller than GeoJSON.
  • Partitioned — split by any column (e.g., by year or country) for parallel queries.
  • Integrates with DuckDB, pandas, Spark, BigQuery, Snowflake, etc.
Python
import geopandas as gpd

SQL
1-- DuckDB
2INSTALL spatial; LOAD spatial;
3SELECT COUNT(*) FROM read_parquet('s3://bucket/parcels.parquet')
4WHERE ST_Within(geometry, ST_GeomFromText('POLYGON((...))'));

GeoParquet is now the default for data lakes holding spatial datasets. Overture Maps and Microsoft Global Building Footprints publish in GeoParquet.

PMTiles

A tile pyramid is millions of tiny files. PMTiles (Protomaps, 2022) packages an entire tile pyramid into one file that browsers can read via HTTP range requests — no tile server required.

Supports:

  • Vector tiles (MVT — Mapbox Vector Tile format) for dynamic styling.
  • Raster tiles (PNG/JPEG/WEBP).

Workflow:

Shell
1# Convert MBTiles to PMTiles
2pmtiles convert world.mbtiles world.pmtiles
3[object Object]
4

Client:

JavaScript
1import { PMTilesProtocol } from 'pmtiles';
2import maplibregl from 'maplibre-gl';
3[object Object]
4

Zero infrastructure, global scale.

FlatGeobuf vs GeoParquet vs PMTiles

Use caseBest format
Serve vector features to a web map on demandFlatGeobuf or PMTiles
Analytic queries on a big vector datasetGeoParquet
Pre-rendered styled map tiles for an appPMTiles (vector tiles)
Offline basemap in a mobile appMBTiles or PMTiles
Archive / exchange with GIS toolsGeoPackage

When to not use them

  • Small datasets (< 1 MB) — GeoJSON is still easier.
  • Interchange with legacy tools — shapefile / GeoPackage still beats binary-only formats.
  • Editing workflows that need row-level mutation — PostGIS or GeoPackage are better.

Cloud-native principles

The common design pattern across these formats:

  1. Deterministic layout — client knows where bytes live.
  2. Front-loaded index — one HTTP request (or few) finds the relevant data.
  3. Range reads — transfer only the needed bytes.
  4. Standard static hosting — no custom server process.

A growing ecosystem applies the same principle to other domains: STAC (metadata), Zarr (N-d arrays), COG (raster), Iceberg + Delta (tables).

Self-check exercises

1. You have 500 GB of building footprints globally. You want analysts to run aggregations without downloading everything. Which format?

GeoParquet partitioned by region (e.g., country or H3 level 2). Stored on S3 or GCS, an analyst can run DuckDB / Spark queries that push predicates down and only read relevant columns and partitions. Typical queries run in seconds despite the dataset size.

2. Difference between FlatGeobuf and PMTiles for serving a web map?

FlatGeobuf serves features — the client still has to render them. PMTiles serves pre-rendered tiles (vector or raster), which the client draws immediately. Use FlatGeobuf for custom client-side analysis; PMTiles for production-grade global map display.

3. A colleague insists on using shapefile for a new 1 GB dataset. Make the case for an alternative.

Shapefile's 2 GB ceiling is near; field names get truncated; multi-layer isn't supported. GeoPackage or GeoParquet gives richer types, multi-layer support, better compression, and modern tool support. GeoPackage is a drop-in replacement for desktop GIS; GeoParquet is better for analytical workloads. Transition is a one-line ogr2ogr conversion.

Summary

  • Cloud-native = front-loaded index + range reads + deterministic layout + static hosting.
  • FlatGeobuf: streamable binary for features.
  • GeoParquet: columnar analytics.
  • PMTiles: zero-infrastructure tile delivery.
  • Use alongside COG (rasters) and STAC (catalogues) for full modern pipelines.

Further reading

  • flatgeobuf.org — specification and clients.
  • geoparquet.org — format spec, v1.1 roadmap.
  • github.com/protomaps/PMTiles — reference implementation.
  • Overture Maps — exemplar GeoParquet publisher at planetary scale.
Module test

Module 6: Spatial Data Formats

Answer these quick multiple-choice questions to check your understanding before moving on.

1. Why is GeoPackage often preferred over shapefile for new projects?
2. What is GeoJSON especially useful for?
3. Which format is designed for cloud-native tiled vector delivery?