PostGISClustering

ST_ClusterDBSCAN

What is ST_ClusterDBSCAN?

ST_ClusterDBSCAN is a PostGIS window function that groups geometries into clusters using the DBSCAN algorithm. Each geometry is assigned an integer cluster id (or NULL if it is noise) based on a distance threshold eps and a minimum cluster size minpoints.

SQL
ST_ClusterDBSCAN(geometry winset geom, float8 eps, integer minpoints) OVER ()integer

A core point has at least minpoints neighbours within eps; clusters grow transitively from cores. Points that are neither core nor within eps of a core are labelled noise (NULL).

When would you use ST_ClusterDBSCAN?

Use ST_ClusterDBSCAN to discover clusters of arbitrary shape in point data — finding crime hot spots, identifying communities of closely-spaced businesses, grouping GPS pings into stops, or flagging outliers as noise. Unlike k-means, DBSCAN doesn't require a pre-set cluster count and naturally identifies outliers.

SQL
1SELECT id,
2       ST_ClusterDBSCAN(geom, 100, 5) OVER () AS cluster_id
3FROM incidents;

FAQs

How should I choose eps and minpoints?

eps is the neighbourhood radius in the input's SRID units. A common heuristic is to plot the distance to the k-th nearest neighbour (where k = minpoints - 1) for every point and pick the "elbow" of the curve. minpoints is commonly set to 4 for 2D data, more for dense datasets; increase it if you want fewer, tighter clusters.

How is this different from ST_ClusterKMeans?

DBSCAN finds clusters of arbitrary shape based on density and labels isolated points as noise. K-Means partitions all points into exactly k spherical clusters. Use DBSCAN when cluster count is unknown or shapes are irregular; use K-Means when you need a fixed partition.

How does it relate to ST_ClusterWithin?

ST_ClusterWithin is a simpler single-linkage clustering that joins anything within a distance threshold — equivalent to DBSCAN with minpoints = 1. DBSCAN's minpoints parameter allows rejecting sparse clusters as noise, which single-linkage cannot do.

Why is my cluster id NULL for some rows?

Those points are DBSCAN noise — neither a core point nor within eps of any core. Noise is intentional and useful: it isolates outliers. If you want every point clustered, lower minpoints or increase eps, or fall back to ST_ClusterWithin.