Module 9: Vector Spatial Analysis

9.4 Proximity and Nearest-Neighbour Analysis

Distance-based questions and the algorithms that answer them efficiently.

Lesson 47 of 100·17 min read

Key takeaways

Proximity analysis asks "how close is X to Y?"; nearest-neighbour asks "what's the closest Y to X?"

Both rely on spatial indexes to scale.

Alternatives include drive-time (network) and great-circle (geodesic) distances — pick based on realism.

Introduction

Many GIS questions are fundamentally about distance: what's my nearest hospital? how many homes are within 1 km of a park? which locations are farthest from transit? This lesson covers the operations and the algorithms behind them.

The two core queries

Nearest neighbour (1-NN)

Given a reference point, find the closest feature in another layer.

SQL

1SELECT s.id AS school_id, h.id AS hospital_id,
2       ST_Distance(s.geom::geography, h.geom::geography) AS metres
3FROM schools s
4CROSS JOIN LATERAL (
5  SELECT id, geom
6  FROM hospitals
7  ORDER BY s.geom <-> geom    -- KNN operator
8  LIMIT 1
9) h;

<-> returns distance between bounding boxes — it's index-accelerated. ST_Distance returns true geometric distance but is slower and not index-accelerated directly.

K nearest neighbours (KNN)

SQL

1...
2ORDER BY s.geom <-> geom
3LIMIT 5

GeoPandas

Python

import geopandas as gpd

Voronoi / Thiessen polygons

A Voronoi polygon around a point contains every location closer to that point than to any other. Useful for:

Service-area proxies — "nearest fire station" (first-order approximation).
Density analysis — large polygons indicate sparse coverage.
Geocoding fallbacks.

Python

1from shapely.ops import voronoi_diagram
2vd = voronoi_diagram(points.unary_union, envelope=study_area)

Or use QGIS: Voronoi Polygons tool.

Variants: weighted Voronoi (each point has influence proportional to a weight), multiplicatively weighted (for retail trade areas).

All-pairs distances

"Distance from every point in A to every point in B". Scales as O(|A|×|B|). For 10 000 × 10 000 that's 100 million pairs — still manageable. For 1 M × 1 M — 10¹² — you need either approximation or different tooling.

Reduce with:

Max distance — drop pairs beyond a threshold.
Clustering — precompute distances to cluster centroids.
H3 / geohash bucketing — only compare points in the same or neighbouring cells.

Drive-time vs Euclidean vs geodesic

Three increasingly realistic distance measures:

Euclidean — straight line on a projected plane. Fast but assumes you can fly.
Geodesic — great-circle distance on the ellipsoid. Fast, respects curvature.
Network — shortest path along a graph (roads, transit). Realistic for ground transport; Module 13 covers it.

Rule of thumb:

Birds, ships, abstract proximity → Euclidean / geodesic.
Delivery, commute, ambulance response → network.

Reverse nearest neighbours

"Every hospital's nearest customer" and "every customer's nearest hospital" produce different graphs — a customer may not appear in any hospital's NN list if they're always second-closest. Reverse-NN queries are useful for identifying coverage gaps and over-supply.

Density and clustering

Related proximity analyses:

Point density — features per unit area.
Kernel density estimation — smoothed density surface.
Spatial autocorrelation (Moran's I) — do nearby features have similar values?
Ripley's K — clustering vs dispersion at multiple scales.

Module 16 covers density methods.

Performance

Always have spatial indexes. KNN queries without them are O(N²).
CROSS JOIN LATERAL + <-> is the fastest pattern in PostGIS.
Max distance (ST_DWithin pre-filter) bounds the search.
Parallelise — nearest-neighbour queries chunked by region parallelise trivially.

Worked example: access to parks

"For each residential building, compute straight-line distance to the nearest park":

SQL

1SELECT b.id,
2       p.name AS nearest_park,
3       ROUND(ST_Distance(b.geom::geography, p.geom::geography)::numeric, 0) AS metres
4FROM buildings b
5CROSS JOIN LATERAL (
6  SELECT name, geom
7  FROM parks
8  ORDER BY b.geom <-> geom
9  LIMIT 1
10) p
11WHERE b.use = 'residential';

Next step: classify into tiers (<250 m, 250–500 m, >500 m) and make a choropleth of the city at block level.

Self-check exercises

1. Why do KNN queries in PostGIS use the `<->` operator?

<-> is the KNN distance operator; it enables the GiST index to sort features by bounding-box distance extremely fast. ST_Distance calculates actual geometric distance for every row pair and can't use the index for ordering. Typical speedup: 100–1000×.

2. When is straight-line distance the wrong proxy for access?

Whenever real movement follows a network — roads, transit, footpaths, rivers. Two locations separated by a river or a highway wall can be 100 m apart straight-line but 1 km by road. For decision-making about accessibility (ambulance response, shopping trips), use network distance (Module 13).

3. Voronoi polygons assume a single, uniform transport medium. When is this assumption wrong?

In real cities: roads are faster than off-road; rivers and walls block some paths; trip costs vary. Voronoi approximates "the nearest facility by straight-line" — fine for birds, rough for customers. Use weighted Voronoi when facilities differ in capacity or attractiveness, and network-based service areas when travel mode matters.

Summary

Nearest-neighbour and proximity are the workhorses of distance-based GIS.
Spatial indexes + KNN operators turn naive O(N²) queries into practical workloads.
Voronoi polygons are a quick service-area approximation.
Switch to network distances when movement is constrained.