16.2 Classification Methods
How to chop a continuous variable into classes — equal interval, quantile, Jenks, and more.
Key takeaways
- Classification breaks determine which spatial patterns emerge on a choropleth.
- Different methods answer different questions; no single default is correct.
- Always disclose the method and consider multiple classifications before publishing.
Introduction
Module 16.1 mentioned classification methods in passing; this lesson is the detailed tour. Choosing a classification is a cartographic decision with real consequences — the same data with different breaks tells different stories.
Equal interval
Divide the range into equal-width classes.
- Formula: breaks at
min + i * (max - min) / kfor i = 0..k. - Preserves value differences between classes.
- Poor for skewed data — most polygons end up in one class.
Good for: data with roughly uniform distributions, or when you want a "data-agnostic" map where class breaks are predictable.
Quantile
Equal polygon count per class.
- k classes means the i/k-th and (i+1)/k-th percentiles bracket each class.
- Every class has the same number of polygons.
- Can group very different values into the same class (especially at the tails).
Good for: ordinal comparison; showing which polygons are in the top 20 %, etc. Bad for: quantitative comparison between classes.
Natural breaks (Jenks)
An algorithm that minimises within-class variance:
$$\min \sum_{i=1}^k \sum_{j \in C_i} (x_j - \bar{x}_i)^2$$
- Finds "natural" gaps in the distribution.
- Adaptive — different data produces different breaks.
- Default in many GIS tools.
Good for: most thematic maps. Bad for: comparing maps across regions (breaks differ per dataset).
Standard deviation
Classes defined as distance from the mean:
- Breaks at mean ± 0.5σ, ±1σ, ±1.5σ, ±2σ.
- Diverging palette naturally.
- Highlights outliers.
Good for: anomaly maps (temperature anomalies, departure from average). Assumes roughly normal distribution.
Manual / user-defined
Breaks chosen for domain meaning:
- WHO PM2.5 thresholds (10, 25, 50 μg/m³).
- Poverty line at $15 000.
- Round numbers (0, 100, 1000, 10000).
Good for: communication to policy or public audiences. Makes comparison across maps possible if the same breaks are reused.
Geometric / logarithmic
For data spanning many orders of magnitude (earthquake magnitudes, population, economic activity):
- Breaks at 10, 100, 1000, 10000 or similar.
- Each class covers a factor.
- Log-transform the data first.
Unclassed (continuous)
No bins — every polygon gets a shade proportional to its exact value.
- Preserves all information.
- Harder to decode specific values.
- Works best with interactive maps (hover to see exact value).
Visual comparison
A tool like pysal.mapclassify lets you see the same data under different methods:
1import mapclassify as mc
2[object Object]
3Visualise all three on adjacent maps to see the difference.
Which to choose?
Rough guide:
- Roughly normal distribution → equal interval or standard deviation.
- Right-skewed (income, density) → quantile or Jenks.
- Meaningful thresholds exist → manual.
- Orders of magnitude → geometric / logarithmic.
- Anomaly / deviation → standard deviation.
- Need honest continuous representation → unclassed.
Disclosure
Always disclose:
- Method used.
- Class break values.
- Number of classes.
On a map legend or an in-caption note. Without disclosure, different readers see different "objective truths".
Self-check exercises
1. Why might Jenks classification give different maps for two similar datasets?
Jenks is data-adaptive — it finds break points specific to each dataset's distribution. Two similar datasets can have slightly different distributions and produce non-identical breaks, making direct comparison harder. For comparable maps (side-by-side time periods, regions), use identical manual breaks or compute breaks once on pooled data and apply to both.
2. When does standard-deviation classification mislead?
When the data isn't approximately normal. For heavily skewed data (income, population density), standard deviation classes are concentrated around a mean that doesn't reflect typical values, with most polygons in the "near mean" classes and outliers exaggerated. Use Jenks, quantile, or log transformation first.
3. What's the advantage of unclassed (continuous) mapping?
It preserves all the information in the data — no arbitrary binning. Readers can see fine gradations and won't be misled by boundary effects where a polygon just barely crosses a class break. The trade-off: specific values are harder to read without interactive tooltips. Works best on digital maps.
Summary
- Classification method shapes the story a choropleth tells.
- Equal interval, quantile, Jenks, standard deviation, manual, and unclassed cover most uses.
- Match method to data distribution and communication goal.
- Always disclose breaks and method.
Further reading
- Jenks, G. F. (1967) — The Data Model Concept in Statistical Mapping.
- Brewer & Pickle (2002) — Evaluation of Methods for Classifying Epidemiological Data.
- mapclassify Python package.
- ColorBrewer — integrates classification and palette choice.