GDALRaster Processing

gdal_sieve

What is gdal_sieve?

gdal_sieve.py filters a classified raster by identifying connected components smaller than a threshold pixel count and reassigning their value to the largest neighbouring component. The result is a cleaner classification with noisy singletons and tiny patches absorbed into their dominant neighbours — the standard post-processing step after pixel-based land-cover classification, change detection, or thresholding.

Shell
gdal_sieve.py [options] <srcfile> [<dstfile>]

Commonly used options:

  • -st <threshold> — size threshold in pixels; components with fewer pixels are merged
  • -4 or -8 — 4- or 8-connectedness (default 4)
  • -b <band> — source band (default 1)
  • -mask <filename> — optional validity mask
  • -nomask — ignore the band mask
  • -of <format> — output driver
  • -co <NAME>=<VALUE> — creation options

If <dstfile> is omitted, the input raster is modified in place.

When would you use gdal_sieve?

Sieve any classified raster that exhibits salt-and-pepper noise before you polygonise, publish, or analyse it. Typical jobs: cleaning a land-cover classification so that one-pixel noise does not generate a million tiny polygons when vectorised (gdal_sieve.py -st 20 lc.tif lc_sieved.tif), removing isolated misclassified pixels from a change-detection raster, or simplifying a thresholded analytical output before extracting vector features with gdal_polygonize.py.

Pick -st based on the minimum mapping unit you want to preserve. A 20-pixel threshold on a 10-metre classification means "keep features larger than 2,000 m²". Be honest with this number — sieving does not invent data, but it does discard legitimate small features along with noise, so tune against known reference sites.

FAQs

Does sieve change pixel values beyond removing small components?

Yes — the removed components are not set to NoData, they are relabelled to the value of their largest neighbour. This is important: sieving a binary mask with -st 50 removes small foreground patches by flipping them to background, which may be exactly what you want or may silently change your totals. Inspect before and after.

4 vs 8 connectedness — which to use?

4-connectedness (default) treats only orthogonal neighbours as connected — two diagonal-touching pixels form two separate components. 8-connectedness merges diagonals into one component. Use 8 when you want aggressive noise removal or when the classifier over-splits slanted features; use 4 when strict topological definitions of connectedness matter. Match the choice you use in gdal_polygonize.py downstream.

How does sieve interact with NoData?

NoData pixels are treated as an uncountable neighbour — they cannot absorb small components and small components cannot absorb them. If a small patch is surrounded entirely by NoData, sieve leaves it alone. Provide -mask if you want a finer validity rule than the source band mask.

When should I sieve before polygonising?

Always, for any raster that is not strictly topologically clean. Without sieving, gdal_polygonize.py on a noisy classification produces an unmanageable number of tiny polygons. A typical pipeline is classify -> sieve -> polygonize -> simplify where sieve controls count and simplify controls vertex density.