20.1 Accuracy, Precision, and Error
Honest descriptions of how wrong your data could be — the terminology, measurement, and propagation.
Key takeaways
- Accuracy and precision are different; both matter.
- Errors are inevitable — track, report, and propagate them through analyses.
- Positional, attribute, and temporal accuracy all need attention.
Introduction
No dataset is perfect; no analysis is exact. Treating data quality as a first-class concern is one of the mark of a mature GIS professional. This lesson covers the vocabulary, measurement techniques, and practices that keep your work honest.
Accuracy vs precision
Classic definitions:
- Accuracy — closeness to the true value.
- Precision — closeness of repeated measurements to each other.
Analogy: a dart player whose darts cluster tightly but far from the bullseye is precise but inaccurate. One whose darts are scattered around the bullseye is accurate but imprecise.
In GIS:
- High accuracy, low precision — correct on average but measurements vary.
- High precision, low accuracy — consistent but systematically biased.
Types of GIS error
Positional
How close is a feature's recorded location to its true location?
- RMS error reported in metres.
- National datasets specify: NSSDA (Americas) or similar standards.
- Methods: compare to independent high-accuracy measurements.
Attribute
Are the attribute values correct?
- Binary: right / wrong.
- Categorical: confusion matrix.
- Continuous: RMS error.
Temporal
When was the data collected? Is it current enough?
- Road datasets from 2010 lack 2024's cycle lanes.
- Satellite imagery has an acquisition date.
Logical consistency
Does the data contradict itself?
- Slivers, overlaps, unclosed polygons.
- Mismatched categorical codes.
- Invalid geometries.
Completeness
What's missing?
- Known features not in the dataset.
- Attribute fields unpopulated.
Each needs its own QA procedure.
Sources of error
- Source data — the original acquisition (GPS precision, digitising error).
- Processing — reprojection, classification, interpolation, simplification.
- Integration — mismatches when combining datasets.
- Interpretation — the human or model that categorised the data.
- Output — printing resolution, display rendering.
Errors compound through pipelines. Track each stage.
Error propagation
Start with ± tolerances on inputs; compute how they combine into outputs.
- Addition/subtraction: errors add in quadrature.
- Multiplication/division: relative errors add.
- Buffer of ±d m on a feature with ±r m positional error → effective buffer ±√(d² + r²).
For complex analyses, Monte Carlo simulation is the general approach: sample inputs from their error distributions many times; observe output distribution.
Reporting
Every serious dataset should report:
- RMSE or CI for positional accuracy.
- Confusion matrix for categorical data.
- Vintage and acquisition method.
- Processing lineage — transformations applied.
- Known limitations.
Standards:
- NSSDA (National Standard for Spatial Data Accuracy).
- ISO 19157 — Geographic information - Data quality.
- FGDC-STD-007 — older US federal standard.
Accuracy tiers for positional data
Rough tiers:
| Tier | Accuracy | Typical source |
|---|---|---|
| Survey-grade | 1–5 cm | RTK GNSS, total station |
| Mapping-grade | 10 cm – 1 m | DGPS, LiDAR |
| Consumer | 1–10 m | Smartphone GPS |
| Geocoded rooftop | 1–30 m | Address geocoding |
| Street-level | 10–100 m | Address interpolation |
| Postal code centroid | 0.1–5 km | ZIP code |
| Administrative | 1–100 km | City / country |
Align your work's accuracy to the decision being made.
Classic accuracy pitfalls
- False precision — 8 decimal places on a 100 m-accuracy coordinate.
- Stale data — road network from 2015 in a 2025 analysis.
- Mixed scales — combining 1:10 000 and 1:100 000 data without re-evaluating accuracy.
- Hidden uncertainty — presenting model outputs as certainties.
Cross-validation for accuracy
For spatial models:
- Leave-one-out.
- k-fold.
- Spatial k-fold (neighbouring samples in same fold, so the test set isn't trivially predicted).
For categorical maps:
- Stratified random sampling of ground-truth points.
- Separate training and validation data.
Self-check exercises
1. Your GPS dart tightly clusters 5 m north of the true point. Accurate or precise?
Precise (tight cluster) but inaccurate (systematic 5 m offset). This pattern often comes from a systematic error — wrong datum, antenna offset, multipath bias in a specific direction. Calibrate with a known reference point; the offset often corrects with a datum shift or antenna offset parameter.
2. Your dataset has 10 m positional accuracy. What analyses does that support?
City-scale accessibility (walking times, park buffers), neighborhood-scale demographics, regional infrastructure planning. It does NOT support: on-parcel analyses (property boundaries), utility-network routing (metre-scale precision needed), or centimetre engineering. Match the question's required accuracy to the data's actual accuracy.
3. Why is Monte Carlo simulation useful for error propagation in complex GIS analyses?
Many spatial operations (buffer + intersect + aggregate) don't have closed-form error propagation formulas. Monte Carlo: sample each input from its error distribution, run the whole pipeline, repeat hundreds or thousands of times, and look at the distribution of outputs. It's computationally expensive but produces honest uncertainty estimates for any workflow.
Summary
- Accuracy ≠ precision; report both.
- Five error dimensions: positional, attribute, temporal, logical, completeness.
- Errors compound through pipelines; track and report.
- Monte Carlo handles complex error propagation.
Further reading
- Heuvelink, G. B. M. — Error Propagation in Environmental Modelling with GIS.
- ISO 19157 — Geographic information – Data quality.
- NSSDA — FGDC accuracy standard.
- Hunter, G. J. — quantitative accuracy reporting papers.