PostGISClustering

ST_ClusterIntersecting

What is ST_ClusterIntersecting?

ST_ClusterIntersecting is a PostGIS aggregate function that groups input geometries into clusters based on connected-component intersection. Any two geometries that share at least one point end up in the same cluster; the result is an array of GeometryCollections, one per cluster.

SQL
1ST_ClusterIntersecting(geometry set g)geometry[]
2ST_ClusterIntersecting(geometry[] g)geometry[]

Each element of the returned array is a GeometryCollection containing the geometries of one cluster.

When would you use ST_ClusterIntersecting?

Use ST_ClusterIntersecting to group overlapping or touching geometries — merging overlapping buffer zones, identifying connected networks, or grouping related feature fragments into wholes. It is the go-to function for "find all features that touch each other directly or transitively".

SQL
1SELECT (ST_Dump(geom_array)).geom AS cluster_geom
2FROM (
3  SELECT unnest(ST_ClusterIntersecting(geom)) AS geom_array
4  FROM parcels
5) t;

FAQs

How is this different from ST_ClusterWithin?

ST_ClusterIntersecting joins geometries that share at least one point (distance = 0). ST_ClusterWithin joins those within a distance threshold you specify — effectively the same clustering semantics but for nearby, not just touching, geometries.

How is it different from ST_ClusterDBSCAN?

ST_ClusterIntersecting is an aggregate that returns cluster geometries; ST_ClusterDBSCAN is a window function that returns a cluster id per input row. DBSCAN also supports a minimum cluster size and noise labelling, while ST_ClusterIntersecting puts every input into some cluster.

Why is the output an array of GeometryCollections?

Each cluster is returned as a GeometryCollection preserving the original geometries (not dissolved). Use ST_Union on each collection if you want a single merged geometry per cluster; use unnest and ST_Dump if you want one row per original feature with a cluster id.

What about very large inputs?

The algorithm is O(n²) in the worst case and can be slow for huge datasets. For large inputs prefer ST_ClusterDBSCAN with eps = 0 and minpoints = 1 (which uses indexes under the hood), or cluster a spatially indexed subset first and iterate.