PostGISClustering

ST_ClusterWithinWin

What is ST_ClusterWithinWin?

ST_ClusterWithinWin is a PostGIS window function that returns a cluster id for each input geometry based on single-linkage clustering by distance. Any two geometries within the supplied distance are in the same cluster, and clusters merge transitively.

SQL
ST_ClusterWithinWin(geometry winset geom, float8 distance) OVER ()integer

It is the window-function companion to the aggregate ST_ClusterWithin, returning a per-row cluster id instead of collections.

When would you use ST_ClusterWithinWin?

Use ST_ClusterWithinWin when you need each row in your input to carry a cluster id — for joins, GROUP BY aggregations, or filtering. It is preferred over the aggregate when you want to combine cluster id with other attributes on the same row.

SQL
1SELECT id, name,
2       ST_ClusterWithinWin(geom, 100) OVER () AS cluster_id
3FROM stores;

FAQs

When should I use the window form vs the aggregate?

Use the window form (ST_ClusterWithinWin) when you need each input row labelled with a cluster id for further per-row operations. Use the aggregate (ST_ClusterWithin) when you want the cluster geometries packaged as arrays of GeometryCollections.

What units is the distance in?

Whatever the input geometry's SRID uses — metres for projected CRSs, degrees for geographic ones. For metre-based thresholds on EPSG:4326 data, reproject first.

How is it different from ST_ClusterDBSCAN?

ST_ClusterWithinWin is equivalent to DBSCAN with minpoints = 1 — every group of connected features, no matter how small, becomes a cluster. DBSCAN's minpoints > 1 lets you flag sparse clusters as noise; use it when you want to reject outliers.

Does it handle very large datasets efficiently?

Yes — PostGIS builds a spatial index for the window partition and uses it to find neighbours within distance. Use PARTITION BY to scope clustering within regions or categories and keep each window small.