Module 15: Spatial Interpolation & Geostatistics

15.1 Spatial Interpolation — Overview

Estimating values where you have no observations — the family of methods and when to use each.

Lesson 74 of 100·14 min read

Key takeaways

Interpolation estimates values at unsampled locations from nearby sampled points.

Methods split into deterministic (IDW, spline) and geostatistical (kriging).

Choice depends on sample density, spatial autocorrelation, and whether you need uncertainty.

Introduction

Real-world measurements are always sparse — a dozen rain gauges cover a county, a thousand trees sampled in a forest. Interpolation estimates the value of a variable at unobserved locations by leveraging nearby observations. This lesson overviews the method families; 15.2 and 15.3 dig into the specific algorithms.

The problem setup

Given N sampled points (x_i, y_i, z_i) and a target location (x, y), estimate z(x, y). Under Tobler's First Law, nearby observations should influence the estimate more than far ones.

Deterministic vs geostatistical

Deterministic methods

Use a fixed formula that depends only on distances to sample points. Produce a single prediction per location.

Examples:

Inverse Distance Weighting (IDW).
Natural neighbour.
Spline / thin-plate spline.
Nearest neighbour.

Geostatistical methods

Use a statistical model of spatial autocorrelation (the variogram). Produce predictions and uncertainty estimates.

Examples:

Ordinary Kriging.
Simple Kriging.
Universal Kriging.
Co-kriging.

Choosing a method

Situation	Suggested method
Sparse points, rough estimate	IDW
Smooth physical phenomenon (elevation)	Spline
Need uncertainty quantification	Kriging
Multiple related variables	Co-kriging
Categorical / class data	Nearest neighbour, indicator kriging
Dense grid (e.g., satellite)	Bilinear / bicubic resampling (not "interpolation" in the sparse sense)

Validation

Interpolation accuracy depends on sample density and the true spatial structure. Validate by:

Hold-out — remove some points, predict them from the rest.
Cross-validation — remove each point in turn and predict it.
RMSE / MAE — numerical error metrics.
Visual inspection — does the surface look plausible?

Every serious interpolation analysis reports cross-validated errors. "The surface looked right" is not enough.

Barriers and anisotropy

Barriers — features (roads, ridges) that break spatial autocorrelation. Some methods can incorporate them.
Anisotropy — spatial autocorrelation that's stronger in one direction than another (common with wind-driven phenomena).

Kriging handles both natively; IDW does not.

Smoothing vs exact

Exact interpolators pass through every sample point exactly (honouring measurements).
Smoothing interpolators approximate — useful when samples are noisy.

Choose based on whether samples are precise measurements (survey elevations: use exact) or noisy observations (rain gauges: smoothing may be appropriate).

Output resolution

Continuous surface output resolution should reflect sample density:

100 samples over 100 km² → 1 km pixel is reasonable.
1 000 samples over 1 km² → 10 m pixel is reasonable.

Predicting at finer resolution than your data supports is false precision.

Common pitfalls

Extrapolation beyond the sampled area — all methods become unreliable.
Non-stationarity — if the phenomenon behaves differently in different regions, global interpolation misleads.
Outliers — a single wrong sample distorts IDW dramatically; robust methods mitigate.
Unit mismatch — don't mix temperature in °C and °F in the same interpolation.

Tools

gdal_grid — IDW, nearest neighbour, moving average, kriging.
QGIS Interpolation plugin.
Python: scipy.interpolate, pykrige, gstat-python, verde.
R: gstat, fields.
ArcGIS Geostatistical Analyst — comprehensive commercial suite.

A worked example

Estimating rainfall across a county from 50 gauge stations:

Python

1from pykrige.ok import OrdinaryKriging
2import numpy as np
3x = gauges['lon'].values
4y = gauges['lat'].values
5z = gauges['rainfall_mm'].values
6OK = OrdinaryKriging(x, y, z, variogram_model='spherical')
7grid_x = np.linspace(x.min(), x.max(), 200)
8grid_y = np.linspace(y.min(), y.max(), 200)
9z_pred, ss = OK.execute('grid', grid_x, grid_y)
10z_pred is the predicted surface; ss is the kriging variance (uncertainty)

Self-check exercises

1. Why does the choice between IDW and kriging matter for scientific reports?

Kriging provides uncertainty estimates (kriging variance); IDW does not. For scientific or regulatory reports you often need to communicate not just "the value here is X" but "and we're 90 % confident it's within [X − σ, X + σ]". IDW gives you a surface but no honest way to quantify its reliability — limiting it to descriptive rather than inferential use.

2. You have 12 rainfall gauges over a 100 km² area. Is kriging appropriate?

Probably not reliably. Kriging's variogram fitting requires enough pairs to estimate the spatial autocorrelation structure; 12 points gives only 66 pairs, usually too few for a stable variogram. Use IDW or a physically-based interpolation (e.g., PRISM-style lapse rate) or add more gauges. Kriging shines with 30+ well-distributed samples.

3. Your interpolated surface has suspiciously smooth minima that don't match observed data. What's happening?

Some interpolators (especially TPS splines and smoothing kriging) regularise toward local means — the output is smoother than reality. For sharp features (peaks, valleys), exact interpolators (IDW with small neighbourhood, kriging without nugget smoothing) may better honour the data. Validate with cross-validation to confirm the surface is faithful.

Summary

Interpolation estimates values at unobserved locations from nearby samples.
Deterministic (IDW, spline) vs geostatistical (kriging, co-kriging).
Kriging quantifies uncertainty; deterministic methods don't.
Cross-validate, don't extrapolate, and match resolution to sample density.