Module 10: Raster Analysis

10.1 Map Algebra

Cell-by-cell arithmetic on rasters — the foundation of continuous-surface analysis.

Lesson 50 of 100·20 min read

Key takeaways

  • Map algebra treats rasters as 2D arrays and applies arithmetic / logical operators cell by cell.
  • Four operation types: local, focal, zonal, global.
  • It's the mathematical backbone of remote sensing, terrain analysis, and environmental modelling.

Introduction

Dana Tomlin formalised map algebra in the 1980s — a systematic vocabulary for operations on raster datasets. Despite the name, most map algebra is straightforward arithmetic and logic applied at scale. This lesson covers the four classes and how you actually use them.

Rasters as arrays

At the computational level a raster is a 2D array of numbers plus metadata (CRS, transform, nodata). Every map algebra operation is really a numpy / GDAL / Earth Engine call on that array.

Python
1import rasterio
2import numpy as np
3[object Object]
4

Four lines produce NDVI — a classic map-algebra result.

Four operation types (Tomlin's taxonomy)

Local

One output cell depends only on the same-location input cells.

  • Arithmetic: A + B, A / B, (A - B) / (A + B).
  • Comparison: A > 100.
  • Reclassification: map old class to new class.
  • Conditional: np.where(mask, A, B).

All NDVI-style calculations are local.

Focal

One output cell depends on a neighbourhood around the input cell (typically 3×3, 5×5, or a custom kernel).

  • Smoothing (mean of 3×3).
  • Edge detection (gradient kernel).
  • Slope / aspect (derivative of a DEM).
  • Focal max / min / majority / standard deviation.

Zonal

Output cells are summarised by zones defined by another raster or vector layer.

  • Mean elevation per watershed.
  • Majority land cover per parcel.
  • Sum population per district.

Global

Output depends on the entire raster.

  • Total area of forest.
  • Distance to nearest forest cell.
  • Global statistics (mean, stdev).

Local operations in practice

Python
1# Burned area mask: NDVI decrease > 0.3 between two dates
2ndvi_before = ...
3ndvi_after  = ...
4burned = (ndvi_before - ndvi_after) > 0.3
5[object Object]
6

Focal operations in practice

Python
1from scipy.ndimage import generic_filter, convolve
2kernel = np.ones((3,3)) / 9.0
3smoothed = convolve(dem, kernel)
4[object Object]
5

Zonal statistics

Using rasterstats:

Python
from rasterstats import zonal_stats

Each feature in districts.gpkg gets back a dict of statistics.

Global operations

Python
1total = np.nansum(population)
2mean = np.nanmean(population)

Or with GDAL CLI:

Shell
gdalinfo -stats population.tif

In PostGIS raster

SQL
1-- Zonal mean
2SELECT zone_id,
3       (ST_SummaryStats(ST_Clip(r.rast, z.geom))).mean AS mean_value
4FROM zones z
5JOIN raster_tiles r ON ST_Intersects(r.rast, z.geom);

Performance is fine for small rasters; for big data, prefer file-based tooling.

In Google Earth Engine

JavaScript
1var ndvi = image.expression(
2  '(nir - red) / (nir + red)',
3  { 'nir': image.select('B8'), 'red': image.select('B4') }
4);

GEE optimises map algebra across global extents.

Alignment — again

Map algebra between two rasters requires them to share the same grid (CRS, origin, resolution). If they don't, resample first:

Shell
1gdalwarp -t_srs EPSG:3857 -tr 10 10 \
2         -te 400000 100000 500000 200000 \
3         input.tif aligned.tif

Memory considerations

For a 20 000 × 20 000 Float32 raster (1.6 GB), operations can blow past laptop RAM. Strategies:

  • Windowed reads — process 512 × 512 tiles at a time.
  • Dask arrays — lazy, chunked compute.
  • Xarray + Zarr — cloud-native N-D arrays.
  • GDAL --config GDAL_CACHEMAX — tune I/O cache.

Self-check exercises

1. Classify these operations as local, focal, zonal, or global: (a) NDVI = (NIR − Red) / (NIR + Red) (b) slope from a DEM (c) mean elevation per country polygon (d) total urban area

(a) Local — each cell depends only on the same cell in both inputs. (b) Focal — slope uses a neighbourhood (3×3 typical). (c) Zonal — summary by polygon zones. (d) Global — one scalar from the entire raster.

2. You compute NDVI and get values outside the expected −1 to +1 range. What's likely wrong?

(1) Nodata wasn't masked, so -9999 - red produces huge negative numbers. (2) You didn't cast to float; integer division in many libraries truncates to 0. (3) You forgot the small epsilon (+ 1e-6) to avoid division by zero. Check nodata masking, casting to float, and test with a known AOI.

3. Zonal statistics across 10 million polygons on a 30 m continent-scale raster is slow. What helps?

(1) Clip or mask the raster to the polygon extent before processing. (2) Use tile-based parallel processing (Dask, Spark, Earth Engine). (3) Consider H3 / S2 bucketing for approximate answers. (4) Batch polygons by region; spatial indexes on polygons help. (5) Use cloud-native formats (COG) and distributed compute.

Summary

  • Map algebra = systematic arithmetic/logic on rasters.
  • Four classes: local (per-cell), focal (neighbourhood), zonal (per-zone summary), global (whole raster).
  • All major tools (numpy, rasterio, Earth Engine, PostGIS, QGIS) implement the same operations.
  • Alignment and memory are the practical gotchas.

Further reading

  • Tomlin, C. D. — Geographic Information Systems and Cartographic Modeling.
  • Rasterio documentation — reading and writing with windows.
  • Xarray documentation — N-dimensional scientific arrays.
  • Google Earth Engine developers guide — map algebra and reducers.