9.4 Proximity and Nearest-Neighbour Analysis
Distance-based questions and the algorithms that answer them efficiently.
Key takeaways
- Proximity analysis asks "how close is X to Y?"; nearest-neighbour asks "what's the closest Y to X?"
- Both rely on spatial indexes to scale.
- Alternatives include drive-time (network) and great-circle (geodesic) distances — pick based on realism.
Introduction
Many GIS questions are fundamentally about distance: what's my nearest hospital? how many homes are within 1 km of a park? which locations are farthest from transit? This lesson covers the operations and the algorithms behind them.
The two core queries
Nearest neighbour (1-NN)
Given a reference point, find the closest feature in another layer.
1SELECT s.id AS school_id, h.id AS hospital_id,
2 ST_Distance(s.geom::geography, h.geom::geography) AS metres
3FROM schools s
4CROSS JOIN LATERAL (
5 SELECT id, geom
6 FROM hospitals
7 ORDER BY s.geom <-> geom -- KNN operator
8 LIMIT 1
9) h;<-> returns distance between bounding boxes — it's index-accelerated. ST_Distance returns true geometric distance but is slower and not index-accelerated directly.
K nearest neighbours (KNN)
1...
2ORDER BY s.geom <-> geom
3LIMIT 5GeoPandas
import geopandas as gpdVoronoi / Thiessen polygons
A Voronoi polygon around a point contains every location closer to that point than to any other. Useful for:
- Service-area proxies — "nearest fire station" (first-order approximation).
- Density analysis — large polygons indicate sparse coverage.
- Geocoding fallbacks.
1from shapely.ops import voronoi_diagram
2vd = voronoi_diagram(points.unary_union, envelope=study_area)Or use QGIS: Voronoi Polygons tool.
Variants: weighted Voronoi (each point has influence proportional to a weight), multiplicatively weighted (for retail trade areas).
All-pairs distances
"Distance from every point in A to every point in B". Scales as O(|A|×|B|). For 10 000 × 10 000 that's 100 million pairs — still manageable. For 1 M × 1 M — 10¹² — you need either approximation or different tooling.
Reduce with:
- Max distance — drop pairs beyond a threshold.
- Clustering — precompute distances to cluster centroids.
- H3 / geohash bucketing — only compare points in the same or neighbouring cells.
Drive-time vs Euclidean vs geodesic
Three increasingly realistic distance measures:
- Euclidean — straight line on a projected plane. Fast but assumes you can fly.
- Geodesic — great-circle distance on the ellipsoid. Fast, respects curvature.
- Network — shortest path along a graph (roads, transit). Realistic for ground transport; Module 13 covers it.
Rule of thumb:
- Birds, ships, abstract proximity → Euclidean / geodesic.
- Delivery, commute, ambulance response → network.
Reverse nearest neighbours
"Every hospital's nearest customer" and "every customer's nearest hospital" produce different graphs — a customer may not appear in any hospital's NN list if they're always second-closest. Reverse-NN queries are useful for identifying coverage gaps and over-supply.
Density and clustering
Related proximity analyses:
- Point density — features per unit area.
- Kernel density estimation — smoothed density surface.
- Spatial autocorrelation (Moran's I) — do nearby features have similar values?
- Ripley's K — clustering vs dispersion at multiple scales.
Module 16 covers density methods.
Performance
- Always have spatial indexes. KNN queries without them are O(N²).
- CROSS JOIN LATERAL +
<->is the fastest pattern in PostGIS. - Max distance (
ST_DWithinpre-filter) bounds the search. - Parallelise — nearest-neighbour queries chunked by region parallelise trivially.
Worked example: access to parks
"For each residential building, compute straight-line distance to the nearest park":
1SELECT b.id,
2 p.name AS nearest_park,
3 ROUND(ST_Distance(b.geom::geography, p.geom::geography)::numeric, 0) AS metres
4FROM buildings b
5CROSS JOIN LATERAL (
6 SELECT name, geom
7 FROM parks
8 ORDER BY b.geom <-> geom
9 LIMIT 1
10) p
11WHERE b.use = 'residential';Next step: classify into tiers (<250 m, 250–500 m, >500 m) and make a choropleth of the city at block level.
Self-check exercises
1. Why do KNN queries in PostGIS use the `<->` operator?
<-> is the KNN distance operator; it enables the GiST index to sort features by bounding-box distance extremely fast. ST_Distance calculates actual geometric distance for every row pair and can't use the index for ordering. Typical speedup: 100–1000×.
2. When is straight-line distance the wrong proxy for access?
Whenever real movement follows a network — roads, transit, footpaths, rivers. Two locations separated by a river or a highway wall can be 100 m apart straight-line but 1 km by road. For decision-making about accessibility (ambulance response, shopping trips), use network distance (Module 13).
3. Voronoi polygons assume a single, uniform transport medium. When is this assumption wrong?
In real cities: roads are faster than off-road; rivers and walls block some paths; trip costs vary. Voronoi approximates "the nearest facility by straight-line" — fine for birds, rough for customers. Use weighted Voronoi when facilities differ in capacity or attractiveness, and network-based service areas when travel mode matters.
Summary
- Nearest-neighbour and proximity are the workhorses of distance-based GIS.
- Spatial indexes + KNN operators turn naive O(N²) queries into practical workloads.
- Voronoi polygons are a quick service-area approximation.
- Switch to network distances when movement is constrained.
Further reading
- Okabe et al. — Spatial Tessellations: Concepts and Applications of Voronoi Diagrams.
- PostGIS
<->operator documentation. - GeoPandas
sjoin_nearestreference. - "How to do nearest-neighbour in PostGIS" — Paul Ramsey blog posts.