Module 14: Remote Sensing

14.5 Image Classification Basics

Turning multispectral imagery into labelled maps — supervised, unsupervised, object-based.

Lesson 72 of 100·18 min read

Key takeaways

Classification assigns each pixel (or object) to a class like water, forest, urban.

Supervised uses labelled training data; unsupervised finds natural clusters.

Modern workflows combine deep learning with traditional methods.

Introduction

A multispectral image is a data cube. A classification map converts it into a labelled raster where every pixel is a category — forest, urban, water. This lesson covers the classical methods and modern extensions.

Three major approaches

Supervised

You provide training examples (pixels or polygons labelled by class). The algorithm learns to reproduce those labels across the image.

Classic algorithms:

Maximum Likelihood — assumes each class is Gaussian in spectral space.
Decision Trees / Random Forest — non-linear, handles mixed data types.
Support Vector Machine (SVM) — good for small training sets.
k-Nearest Neighbour — simple, interpretable.

Modern:

Convolutional Neural Networks (CNNs) — state-of-the-art on imagery.
Transformers (ViT) — recent high-accuracy approaches.

Unsupervised

No labels needed. The algorithm finds natural groupings (clusters) in spectral space; you interpret what each cluster represents.

Algorithms:

K-means clustering — fastest.
ISODATA — iterative, adaptive cluster count.
Gaussian Mixture Models.

Useful when labels aren't available or as a preprocessing exploration step.

Object-based

Instead of classifying individual pixels, first segment the image into objects (groups of similar neighbouring pixels), then classify each object. Produces cleaner results on high-resolution imagery and captures shape / context.

Tools: eCognition, OrfeoToolbox, Segment-Anything (Meta SAM).

The supervised pipeline

Pre-processing — atmospheric correction, cloud masking, mosaicking.
Training-data collection — polygons or pixels for each class, ideally stratified and independent.
Feature engineering — bands, indices (NDVI, NDWI), texture measures, contextual features.
Model training — fit a classifier on training features.
Inference — apply to the full image.
Post-processing — majority filter, morphological cleaning.
Accuracy assessment — compare predictions to independent validation data.

Accuracy assessment

A confusion matrix summarises predictions vs truth:

	Pred. Forest	Pred. Urban	Pred. Water
Actual Forest	95	3	2
Actual Urban	5	90	5
Actual Water	0	2	98

Metrics:

Overall accuracy — diagonal / total.
Producer's accuracy — per class, diagonal / column total (how much of truth was correctly predicted).
User's accuracy — per class, diagonal / row total (how reliable predictions are).
Kappa statistic — agreement beyond chance.

Report all four; overall accuracy alone can hide per-class failures.

Training data quality

The single biggest determinant of accuracy:

Representative of all spatial / temporal variation.
Sufficient volume — hundreds to thousands per class, depending on class complexity.
Clean — mislabelled training data poisons the model.
Independent of validation — don't use the same pixels for both.

Sources: field visits, high-resolution imagery, existing maps (OSM), crowd-sourced campaigns.

In scikit-learn

Python

1import rasterio
2import numpy as np
3from sklearn.ensemble import RandomForestClassifier
4Load a stack of bands
5with rasterio.open('stack.tif') as src:
6X = src.read().reshape(src.count, -1).T  # pixels x bands
7Load training labels (rasterised polygons)
8with rasterio.open('training.tif') as src:
9y = src.read(1).ravel()
10mask = y > 0  # 0 = no label
11clf = RandomForestClassifier(n_estimators=200, random_state=42)
12clf.fit(X[mask], y[mask])

Modern deep learning

Pre-trained CNN / transformer models for Earth observation:

Clay — foundation model for Earth observation.
Prithvi (IBM / NASA) — open EO foundation model.
Segment-Anything — interactive segmentation from prompts.
DeepLabV3, U-Net — semantic segmentation.

These have transformed accuracy and labour requirements but need GPUs for training and inference.

Change detection

Two images, same area, different dates → classification change. Methods:

Post-classification comparison — classify each separately, compare maps.
Image differencing — subtract indices; threshold differences.
Change vector analysis — magnitude and direction of multi-band change.

Deforestation monitoring, urban expansion, burn scar mapping all rely on change detection.

Self-check exercises

1. Why is a confusion matrix more useful than overall accuracy alone?

Overall accuracy can hide class-specific failures. 95% accuracy sounds great, but if the 5% error is entirely concentrated in one class (say, "mangrove" confused with "other vegetation"), the map is useless for mangrove conservation. A confusion matrix shows producer's and user's accuracies per class, revealing exactly where the model struggles.

2. Unsupervised clustering gives you 8 clusters, but you wanted 5 land-cover classes. What's the recommended workflow?

Run clustering with more classes than you need (say 8–12), then manually merge clusters that represent the same real-world class (e.g., "bright concrete" + "dark concrete" → "urban"). Alternatively, use the clusters as exploratory analysis to find distinct spectral types, then re-classify with supervised labels once you know what classes matter.

3. Object-based classification vs pixel-based — when does object-based win?

High-resolution imagery (under ~5 m), where individual objects (buildings, cars, trees) are larger than pixels. Object-based uses shape, texture, and context — features pixels can't carry alone. For coarse imagery (Landsat 30 m), pixels already cover whole objects, so pixel-based works well. Object-based also produces cleaner boundary maps without "salt-and-pepper" speckle from per-pixel noise.

Summary

Classification converts imagery to labelled maps.
Supervised needs labels; unsupervised finds clusters; object-based classifies regions.
Accuracy assessment is non-negotiable — confusion matrix, producer's / user's accuracy, kappa.
Deep learning is the current frontier; classical methods remain valuable.