Module 16: Thematic Mapping & Visualization

16.4 Heatmaps and Kernel Density Estimation

Smoothed density surfaces from point data — beautiful and powerful, often misused.

Lesson 80 of 100·13 min read

Key takeaways

Heatmaps visualise the density of events or features as smoothed surfaces.

Kernel Density Estimation (KDE) is the underlying math: a kernel around each point, summed.

Bandwidth selection drives everything — too small is noisy, too large is over-smoothed.

Introduction

"Crime heatmap of the city", "hotspot of migratory bird sightings" — these are kernel density visualisations. They take discrete point data and produce a continuous surface showing where events cluster. This short lesson covers the math, tools, and common misuses.

The math

Given N points, a kernel function K (usually Gaussian), and a bandwidth h:

$$\hat{f}(x) = \frac{1}{Nh^2} \sum_{i=1}^{N} K\left(\frac{x - x_i}{h}\right)$$

Each point contributes a small "bump" around itself; bumps accumulate in dense areas.

Kernel choice

Gaussian — smooth, infinite support. Common default.
Epanechnikov — smooth, compact support. Slightly faster.
Uniform (boxcar) — every point within h contributes equally; results are blocky.
Quartic, triangular — intermediate.

Choice matters less than bandwidth.

Bandwidth (h)

The single most important parameter.

Small h → surface resembles individual points (noise).
Large h → surface over-smoothed; true clusters disappear.

Selection methods:

Rule-of-thumb — h = 1.06 * σ * N^(-1/5) (Silverman, 1986).
Cross-validation — optimise predictive likelihood.
Trial and error — try a few, pick what makes sense.

A safe first attempt: h ≈ average distance between nearest neighbours × 3.

Implementation

Python

1from scipy.stats import gaussian_kde
2import numpy as np
3x, y = points['lon'].values, points['lat'].values
4kde = gaussian_kde(np.vstack([x, y]))
5Evaluate on a grid

QGIS

Processing → Interpolation → Heatmap (Kernel Density Estimation). Parameters: radius, pixel size, kernel. Output: GeoTIFF density.

Common misuses

Not normalising by population

"Crime heatmap" over a city shows where crimes happen — which often just shows where people are. For meaningful hotspot detection, compute crime rate (crimes per person) rather than raw crime density.

Choosing bandwidth by aesthetics

A bandwidth picked because "the map looks dramatic" may hide or invent features. Cross-validate or use rule-of-thumb selection.

Over-extending the surface

Kernel density is reliable only in areas with data; evaluating it in empty areas produces smooth ramps that suggest declining but real values. Mask to a study-area boundary or to areas with non-zero observations.

Inappropriate colour ramps

Bright red hotspots in otherwise cool colours are visually compelling but scientifically misleading. Use a perceptually uniform ramp (viridis, magma) and disclose the density scale in the legend.

Network-constrained KDE

For events that occur along networks (crimes on streets, seal sightings along coastlines), conventional 2D KDE misleadingly smooths across roads / barriers. Network KDE restricts kernels to the network — implemented in GRASS's v.kernel and specialised packages.

Space-time KDE

Extend to 3D — time as the third dimension. Useful for monitoring outbreak spread. Implementations less common; 2D snapshots by time period often substitute.

Hotspot statistics

Beyond visualisation, quantitative hotspot detection:

Moran's I — global spatial autocorrelation.
Getis-Ord Gi* — local statistic; identifies cells significantly higher than expected.
Ripley's K — distance-dependent clustering.

pysal implements all three.

Self-check exercises

1. Why can a "crime heatmap" be misleading without normalisation?

It shows where crimes happen, which correlates strongly with where people are. Downtown has many crimes per km² but also many people; a sparse suburb with high per-capita crime looks safer than it is. For meaningful comparison, normalise — crimes per 1 000 residents, or per employee, or by time window.

2. What's the effect of doubling the KDE bandwidth?

Doubling the bandwidth roughly quadruples the area influenced by each point. Small clusters merge into larger smooth blobs; local variation disappears; broad patterns emerge. Too-large bandwidth erases real hotspots. Always try multiple bandwidths and see how the map changes — stable patterns are real; patterns that appear only at one bandwidth are likely artefacts.

3. When should you use network-constrained KDE instead of plain 2D KDE?

When events occur along a network and movement is constrained to that network — crimes on streets, wildlife sightings along coastlines, incidents on transit lines. 2D KDE spreads density across buildings and non-relevant spaces, producing misleading smoothness. Network KDE keeps the kernel constrained to the line network, giving more realistic hotspot maps.

Summary

KDE converts points to a smoothed density surface via kernels.
Bandwidth is the dominant parameter — choose carefully.
Normalise by population or base rate to avoid misleading hotspots.
Network-constrained KDE is the right tool for network events.