12.1 Forward Geocoding
Turning addresses into coordinates — how it works, which providers to use, and where it fails.
Key takeaways
- Forward geocoding converts an address string to coordinates.
- Accuracy depends on address normalisation, reference data coverage, and interpolation.
- Commercial, open-source, and hybrid providers all have trade-offs.
Introduction
"1600 Pennsylvania Ave, Washington DC" is a string most humans understand but most software cannot directly plot. Forward geocoding is the service that turns such addresses into coordinates. This lesson covers how it works, the main providers, and the failure modes you'll definitely meet.
The core pipeline
- Parse / normalise the address — decompose into number, street, city, region, country.
- Match against a reference database (the address index).
- Interpolate within a matched street segment if no exact point is known.
- Return coordinates with a confidence score, type, and matched components.
Address normalisation
The same address has many spellings:
- "1600 Pennsylvania Ave"
- "1600 Pennsylvania Avenue"
- "1600 Pennsylvania Av NW Washington"
- "1600 Penn Ave DC"
Parsers (libpostal, Pelias, Nominatim) normalise spelling, expand abbreviations, and tokenise components. Good parsing is usually the difference between 70% and 95% match rates.
Reference data
Geocoders match against:
- Address points — every building has a known point. Best accuracy.
- Street ranges — "100–200 Main St has odd numbers 101, 103, ...; even numbers 100, 102, ...". Interpolates within the range.
- Postal-code centroids — worst accuracy but widely available.
USA: USPS + TIGER. UK: Ordnance Survey AddressBase. Denmark: DAWA. Many countries have authoritative registries.
Providers
| Provider | Coverage | Model |
|---|---|---|
| Google Maps Geocoding API | Global | Commercial |
| Mapbox Geocoding | Global | Commercial |
| HERE | Global | Commercial |
| Esri World Geocoder | Global | Commercial |
| Nominatim (OSM) | Global | Open |
| Pelias | Global | Open, self-hosted |
| Photon | Global | Open |
| Geoapify | Global | Hybrid |
| National registries | Per country | Often free |
Commercial providers generally have better coverage and freshness; open ones are free but you host them. National registries (DAWA, AddressBase) are often the most accurate for their country.
Confidence and match levels
Every geocoder returns metadata:
- Match level: rooftop, interpolated, street, postcode, city, country.
- Confidence: 0–1 or categorical.
- Matched components: which parts of the input were used.
Use these to filter low-quality results or to flag manual review.
Python quickstart
1import requests
2[object Object]
3Batch geocoding
For many addresses:
- Respect rate limits.
- Cache results.
- Use batch endpoints if available (faster and cheaper per address).
- Parallelise across threads or workers.
1import pandas as pd
2from concurrent.futures import ThreadPoolExecutorAccuracy
Typical accuracies for rooftop matches:
- Address point match: 1–10 m.
- Street range interpolation: 10–50 m.
- Postal code centroid: 100 m–1 km.
- Locality match: kilometres.
Always filter to your required accuracy. A 1 km error is fine for demographic analysis and catastrophic for package delivery.
Failure modes
- Spelling: "Colngane" for "Cologne". Use fuzzy matching.
- Incorrect county: "Springfield" in 30+ US states. Include state/country.
- Rural addresses without numbers.
- PO boxes — postal addresses with no physical location.
- New developments — unlisted addresses until providers update.
- Non-Latin scripts — Cyrillic, Chinese, Arabic addresses may need transliteration.
Privacy
Geocoding customer addresses is geocoding their home locations. Treat the resulting coordinates with the same care as the original PII — encryption at rest, access logs, deletion schedules. For aggregated analysis, round to 100 m or larger.
Reverse geocoding
The opposite operation — from coordinates to a human-readable address — is covered in 12.2.
Self-check exercises
1. Why does a country parameter significantly improve geocoding accuracy?
"Paris" exists in France, Texas, Ontario, and dozens of smaller places globally. Restricting to a country narrows the candidate set dramatically and reduces false matches. Good geocoders also use the country's address format conventions (street/number order, postal-code placement) to improve parsing.
2. A geocoder returns match level "interpolated". What does this mean for accuracy?
The exact house number isn't in the reference data; the geocoder estimated the location by linear interpolation along a street segment (e.g., if 100–200 Main St is a known segment, #150 is placed at 50% along). Accuracy is typically 10–50 m — good enough for delivery logistics in a metro area but poor for on-parcel analysis.
3. You geocode 100,000 addresses and 15% fail. What's a reasonable recovery pipeline?
Try a secondary provider on the failures (different reference data often succeeds where the first failed). Normalise the input with libpostal and retry. For persistent failures, reduce to a coarser target (postal code, city) and flag for manual review. Track the success rate per input format to improve upstream collection.
Summary
- Forward geocoding: address → coordinates.
- Pipeline: normalise → match → interpolate.
- Accuracy varies from rooftop (metres) to locality (km).
- Always record match level and confidence; filter or fall back appropriately.
Further reading
- libpostal — international address parser / normaliser.
- Nominatim and Pelias documentation.
- OpenAddresses — aggregator of open authoritative address data.
- Who's On First — gazetteer of administrative places (Mapzen project).