Module 20: Data Quality, Ethics & Careers

20.1 Accuracy, Precision, and Error

Honest descriptions of how wrong your data could be — the terminology, measurement, and propagation.

Lesson 95 of 100·15 min read

Key takeaways

Accuracy and precision are different; both matter.

Errors are inevitable — track, report, and propagate them through analyses.

Positional, attribute, and temporal accuracy all need attention.

Introduction

No dataset is perfect; no analysis is exact. Treating data quality as a first-class concern is one of the mark of a mature GIS professional. This lesson covers the vocabulary, measurement techniques, and practices that keep your work honest.

Accuracy vs precision

Classic definitions:

Accuracy — closeness to the true value.
Precision — closeness of repeated measurements to each other.

Analogy: a dart player whose darts cluster tightly but far from the bullseye is precise but inaccurate. One whose darts are scattered around the bullseye is accurate but imprecise.

In GIS:

High accuracy, low precision — correct on average but measurements vary.
High precision, low accuracy — consistent but systematically biased.

Types of GIS error

Positional

How close is a feature's recorded location to its true location?

RMS error reported in metres.
National datasets specify: NSSDA (Americas) or similar standards.
Methods: compare to independent high-accuracy measurements.

Attribute

Are the attribute values correct?

Binary: right / wrong.
Categorical: confusion matrix.
Continuous: RMS error.

Temporal

When was the data collected? Is it current enough?

Road datasets from 2010 lack 2024's cycle lanes.
Satellite imagery has an acquisition date.

Logical consistency

Does the data contradict itself?

Slivers, overlaps, unclosed polygons.
Mismatched categorical codes.
Invalid geometries.

Completeness

What's missing?

Known features not in the dataset.
Attribute fields unpopulated.

Each needs its own QA procedure.

Sources of error

Source data — the original acquisition (GPS precision, digitising error).
Processing — reprojection, classification, interpolation, simplification.
Integration — mismatches when combining datasets.
Interpretation — the human or model that categorised the data.
Output — printing resolution, display rendering.

Errors compound through pipelines. Track each stage.

Error propagation

Start with ± tolerances on inputs; compute how they combine into outputs.

Addition/subtraction: errors add in quadrature.
Multiplication/division: relative errors add.
Buffer of ±d m on a feature with ±r m positional error → effective buffer ±√(d² + r²).

For complex analyses, Monte Carlo simulation is the general approach: sample inputs from their error distributions many times; observe output distribution.

Reporting

Every serious dataset should report:

RMSE or CI for positional accuracy.
Confusion matrix for categorical data.
Vintage and acquisition method.
Processing lineage — transformations applied.
Known limitations.

Standards:

NSSDA (National Standard for Spatial Data Accuracy).
ISO 19157 — Geographic information - Data quality.
FGDC-STD-007 — older US federal standard.

Accuracy tiers for positional data

Rough tiers:

Tier	Accuracy	Typical source
Survey-grade	1–5 cm	RTK GNSS, total station
Mapping-grade	10 cm – 1 m	DGPS, LiDAR
Consumer	1–10 m	Smartphone GPS
Geocoded rooftop	1–30 m	Address geocoding
Street-level	10–100 m	Address interpolation
Postal code centroid	0.1–5 km	ZIP code
Administrative	1–100 km	City / country

Align your work's accuracy to the decision being made.

Classic accuracy pitfalls

False precision — 8 decimal places on a 100 m-accuracy coordinate.
Stale data — road network from 2015 in a 2025 analysis.
Mixed scales — combining 1:10 000 and 1:100 000 data without re-evaluating accuracy.
Hidden uncertainty — presenting model outputs as certainties.

Cross-validation for accuracy

For spatial models:

Leave-one-out.
k-fold.
Spatial k-fold (neighbouring samples in same fold, so the test set isn't trivially predicted).

For categorical maps:

Stratified random sampling of ground-truth points.
Separate training and validation data.

Self-check exercises

1. Your GPS dart tightly clusters 5 m north of the true point. Accurate or precise?

Precise (tight cluster) but inaccurate (systematic 5 m offset). This pattern often comes from a systematic error — wrong datum, antenna offset, multipath bias in a specific direction. Calibrate with a known reference point; the offset often corrects with a datum shift or antenna offset parameter.

2. Your dataset has 10 m positional accuracy. What analyses does that support?

City-scale accessibility (walking times, park buffers), neighborhood-scale demographics, regional infrastructure planning. It does NOT support: on-parcel analyses (property boundaries), utility-network routing (metre-scale precision needed), or centimetre engineering. Match the question's required accuracy to the data's actual accuracy.

3. Why is Monte Carlo simulation useful for error propagation in complex GIS analyses?

Many spatial operations (buffer + intersect + aggregate) don't have closed-form error propagation formulas. Monte Carlo: sample each input from its error distribution, run the whole pipeline, repeat hundreds or thousands of times, and look at the distribution of outputs. It's computationally expensive but produces honest uncertainty estimates for any workflow.

Summary

Accuracy ≠ precision; report both.
Five error dimensions: positional, attribute, temporal, logical, completeness.
Errors compound through pipelines; track and report.
Monte Carlo handles complex error propagation.