Module 12: Geocoding & Addressing

12.1 Forward Geocoding

Turning addresses into coordinates — how it works, which providers to use, and where it fails.

Lesson 61 of 100·15 min read

Key takeaways

  • Forward geocoding converts an address string to coordinates.
  • Accuracy depends on address normalisation, reference data coverage, and interpolation.
  • Commercial, open-source, and hybrid providers all have trade-offs.

Introduction

"1600 Pennsylvania Ave, Washington DC" is a string most humans understand but most software cannot directly plot. Forward geocoding is the service that turns such addresses into coordinates. This lesson covers how it works, the main providers, and the failure modes you'll definitely meet.

The core pipeline

  1. Parse / normalise the address — decompose into number, street, city, region, country.
  2. Match against a reference database (the address index).
  3. Interpolate within a matched street segment if no exact point is known.
  4. Return coordinates with a confidence score, type, and matched components.

Address normalisation

The same address has many spellings:

  • "1600 Pennsylvania Ave"
  • "1600 Pennsylvania Avenue"
  • "1600 Pennsylvania Av NW Washington"
  • "1600 Penn Ave DC"

Parsers (libpostal, Pelias, Nominatim) normalise spelling, expand abbreviations, and tokenise components. Good parsing is usually the difference between 70% and 95% match rates.

Reference data

Geocoders match against:

  • Address points — every building has a known point. Best accuracy.
  • Street ranges — "100–200 Main St has odd numbers 101, 103, ...; even numbers 100, 102, ...". Interpolates within the range.
  • Postal-code centroids — worst accuracy but widely available.

USA: USPS + TIGER. UK: Ordnance Survey AddressBase. Denmark: DAWA. Many countries have authoritative registries.

Providers

ProviderCoverageModel
Google Maps Geocoding APIGlobalCommercial
Mapbox GeocodingGlobalCommercial
HEREGlobalCommercial
Esri World GeocoderGlobalCommercial
Nominatim (OSM)GlobalOpen
PeliasGlobalOpen, self-hosted
PhotonGlobalOpen
GeoapifyGlobalHybrid
National registriesPer countryOften free

Commercial providers generally have better coverage and freshness; open ones are free but you host them. National registries (DAWA, AddressBase) are often the most accurate for their country.

Confidence and match levels

Every geocoder returns metadata:

  • Match level: rooftop, interpolated, street, postcode, city, country.
  • Confidence: 0–1 or categorical.
  • Matched components: which parts of the input were used.

Use these to filter low-quality results or to flag manual review.

Python quickstart

Python
1import requests
2[object Object]
3

Batch geocoding

For many addresses:

  • Respect rate limits.
  • Cache results.
  • Use batch endpoints if available (faster and cheaper per address).
  • Parallelise across threads or workers.
Python
1import pandas as pd
2from concurrent.futures import ThreadPoolExecutor

Accuracy

Typical accuracies for rooftop matches:

  • Address point match: 1–10 m.
  • Street range interpolation: 10–50 m.
  • Postal code centroid: 100 m–1 km.
  • Locality match: kilometres.

Always filter to your required accuracy. A 1 km error is fine for demographic analysis and catastrophic for package delivery.

Failure modes

  • Spelling: "Colngane" for "Cologne". Use fuzzy matching.
  • Incorrect county: "Springfield" in 30+ US states. Include state/country.
  • Rural addresses without numbers.
  • PO boxes — postal addresses with no physical location.
  • New developments — unlisted addresses until providers update.
  • Non-Latin scripts — Cyrillic, Chinese, Arabic addresses may need transliteration.

Privacy

Geocoding customer addresses is geocoding their home locations. Treat the resulting coordinates with the same care as the original PII — encryption at rest, access logs, deletion schedules. For aggregated analysis, round to 100 m or larger.

Reverse geocoding

The opposite operation — from coordinates to a human-readable address — is covered in 12.2.

Self-check exercises

1. Why does a country parameter significantly improve geocoding accuracy?

"Paris" exists in France, Texas, Ontario, and dozens of smaller places globally. Restricting to a country narrows the candidate set dramatically and reduces false matches. Good geocoders also use the country's address format conventions (street/number order, postal-code placement) to improve parsing.

2. A geocoder returns match level "interpolated". What does this mean for accuracy?

The exact house number isn't in the reference data; the geocoder estimated the location by linear interpolation along a street segment (e.g., if 100–200 Main St is a known segment, #150 is placed at 50% along). Accuracy is typically 10–50 m — good enough for delivery logistics in a metro area but poor for on-parcel analysis.

3. You geocode 100,000 addresses and 15% fail. What's a reasonable recovery pipeline?

Try a secondary provider on the failures (different reference data often succeeds where the first failed). Normalise the input with libpostal and retry. For persistent failures, reduce to a coarser target (postal code, city) and flag for manual review. Track the success rate per input format to improve upstream collection.

Summary

  • Forward geocoding: address → coordinates.
  • Pipeline: normalise → match → interpolate.
  • Accuracy varies from rooftop (metres) to locality (km).
  • Always record match level and confidence; filter or fall back appropriately.

Further reading

  • libpostal — international address parser / normaliser.
  • Nominatim and Pelias documentation.
  • OpenAddresses — aggregator of open authoritative address data.
  • Who's On First — gazetteer of administrative places (Mapzen project).