gdal_polygonize
What is gdal_polygonize?
gdal_polygonize.py walks a raster band and creates one vector polygon per connected region of pixels that share the same value. It is the inverse of gdal_rasterize and the go-to tool for extracting vector features from a classified raster, a mask, or a label image. The polygon value is stored as an attribute (default DN — digital number) on the output layer.
gdal_polygonize.py [options] <raster_file> [-b <band>] [-mask <filename>] <out_file> [<layer>] [<fieldname>]Commonly used options:
-b <band>— source band (default 1)-mask <filename>— use a mask raster (ordefaultfor the band mask) to restrict polygonisation-nomask— ignore the source band mask-8— use 8-connectedness (diagonal neighbours) instead of default 4-connectedness-f <format>— output OGR format (ESRI Shapefile,GPKG,GeoJSON, …)-o <NAME>=<VALUE>— layer creation options-lco <NAME>=<VALUE>— same as above (explicit layer creation option)-ovr <level>— polygonize an overview level instead of full resolution
When would you use gdal_polygonize?
Reach for gdal_polygonize.py whenever a raster contains discrete classes or masks that you want as polygons for cartography, topological analysis, or attribute joins. Typical jobs: converting a land-cover classification into polygon features with one polygon per class patch (gdal_polygonize.py landcover.tif landcover.gpkg landcover class), extracting cloud or water masks from a remote-sensing product, or outlining connected components of a thresholded analysis raster (e.g. burn scars above an index value).
For a clean pipeline from classifier output to vector, combine with gdal_sieve first to remove small connected components below a minimum pixel count, then gdal_polygonize.py to vectorise: gdal_sieve.py -st 50 class.tif class_sieved.tif && gdal_polygonize.py class_sieved.tif class.gpkg. Without sieving, continuous-tone rasters explode into millions of tiny polygons.
FAQs
Why did I get millions of polygons from my raster?
gdal_polygonize.py creates one polygon per connected component of identical values. On any raster that is not strictly discrete — such as a thresholded continuous-value raster with pixel-level noise — this explodes. Pre-process with gdal_sieve to eliminate small components, or threshold more aggressively, before polygonising.
4-connected or 8-connected — which should I use?
The default 4-connectedness (rook neighbours) is correct for strictly topological classifications where diagonal-only adjacency should not count as connected. Use -8 (queen neighbours) when touch-at-corners should merge features — common for cadastre-like classified rasters or when diagonals clearly belong together. Be consistent with whichever rule gdal_sieve used upstream.
How do I exclude NoData or background from the output?
Set a mask or band mask on the source raster so NoData pixels are excluded, or pass -mask <filename> pointing at a 0/non-zero mask. After polygonising, you can also ogr2ogr -where "DN <> 0" to remove the background class. The mask approach avoids creating those polygons in the first place and is faster.
Why are polygon edges so jagged?
Polygonisation emits orthogonal edges at pixel boundaries by construction — expect stair-steps at every pixel. Post-smooth with ogr2ogr -simplify <tolerance> or a dedicated generalisation tool if the jaggies harm cartography. Do not simplify before downstream spatial joins — topology errors follow.