7.2 OpenStreetMap — A Deep Dive
Structure, tags, tools, and quirks of the world's most important open geospatial dataset.
Key takeaways
- OpenStreetMap is an editable global database with ~10 billion nodes and ~1 billion ways.
- Its data model is nodes, ways, and relations with free-form tags — not a fixed schema.
- Extract what you need with Overpass, Geofabrik, or overturemaps.org rather than processing the full planet dump.
Introduction
OpenStreetMap (OSM) is the most important open geospatial dataset in the world. Founded in 2004 as a "Wikipedia for maps," it now underpins large portions of commercial mapping (Apple, Amazon, Facebook, Microsoft) and nearly all humanitarian and research mapping. This lesson covers OSM's structure, how to pull data from it, and the pitfalls.
The data model
Three primitives:
- Nodes — points with lat/lon and free-form tags. Used for individual POIs (a café, a mailbox) or as geometry vertices.
- Ways — ordered lists of nodes forming a line (road) or closed polygon (building). Ways don't store coordinates; they reference node IDs.
- Relations — collections of nodes, ways, or other relations. Used for multipolygons (islands, lakes with holes), routes (bus lines), and logical groupings (a country with its provinces).
Every element has an id, version, timestamp, user, and a dictionary of tags.
Example (illustrative):
1<node id="42" lat="55.68" lon="12.57" version="3">
2 <tag k="amenity" v="cafe"/>
3 <tag k="name" v="Kaffebaren"/>
4 <tag k="opening_hours" v="Mo-Fr 07:00-19:00"/>
5</node>Tags — OSM's schema
OSM uses key-value tags with no fixed schema. Popular keys (from the OSM wiki):
highway=*— roads (motorway, trunk, primary, residential, footway, cycleway, ...).building=*— yes, house, apartments, garage.amenity=*— cafe, school, hospital, bus_station.natural=*— water, wood, coastline, peak.landuse=*— residential, farmland, forest.place=*— city, town, village, hamlet.boundary=*— administrative, national_park.
Tags are flexible but inconsistent — a feature may be amenity=cafe or cuisine=coffee_shop depending on who mapped it. Always inspect actual tag distributions before assuming uniformity.
Extracting OSM data
Whole-planet dump
planet.osm.pbf — ~90 GB binary file, the complete current snapshot. Processing requires specialised tools (osmium, Overpass API server setup). Useful for serious processing but overkill for most projects.
Geofabrik regional extracts
download.geofabrik.de provides pre-cut country, state, and city extracts in .osm.pbf. For a regional analysis, this is usually the best start.
1# Convert OSM PBF to GeoPackage
2osmium export germany.osm.pbf -o germany.gpkgOverpass API
Overpass is OSM's purpose-built query API. Use overpass-turbo.eu interactively:
1[out:json][timeout:25];
2area["name"="Copenhagen"]->.searchArea;
3(
4 node["amenity"="cafe"](area.searchArea);
5);
6out geom;For programmatic use, the osmnx Python package wraps Overpass for road network retrieval:
import osmnx as oxOverture Maps
The Overture Maps Foundation (2022+) publishes OSM-derived + vendor-contributed data as GeoParquet — places, buildings, transportation, admin boundaries. It's free, updated monthly, and avoids the raw-OSM tag mess for many use cases.
Quality and consistency
OSM quality varies enormously by region:
- Western Europe, US urban areas — often more detailed than commercial data.
- Global rural areas — patchy.
- Rapidly changing countries — updates lag.
- Specialised attributes (lane count, speed limits) — incomplete even in well-mapped areas.
Humanitarian OSM Team (HOT) and organisations like Mapbox invest in targeted mapping campaigns to fill gaps.
Licensing — ODbL
OSM is licensed under the Open Database License (ODbL) 1.0. Key obligations:
- Attribution: "© OpenStreetMap contributors".
- Share-alike: if you publish a derived work that reveals substantial portions of OSM, it must be under ODbL.
- "Substantial" is deliberately vague; legal advice recommended for commercial products.
Common pitfalls
- Multipolygons — lakes with islands or countries with enclaves require relation processing; simple way extraction misses them.
- Tagging inconsistency — the same concept is sometimes tagged differently.
- Missing geometry — ways reference nodes; if nodes aren't loaded, geometry is missing.
- Coastline — stored as a world-spanning way; special tooling (
osmcoastline) required to turn it into polygons. - Changing IDs — when a feature is split or merged, its ID changes. Don't rely on IDs for permanence.
Analytical patterns
1# All restaurants within 500 m of a metro station in Berlin
2import osmnx as ox
3import geopandas as gpdThree lines of code replicate analyses that previously required commercial datasets.
Editing OSM
To contribute:
- Create an account at openstreetmap.org.
- Use the in-browser iD editor, or JOSM for advanced edits.
- Follow tagging conventions (see OSM wiki).
- Add a changeset comment explaining what/why.
Edits appear globally within minutes, propagating to downstream consumers within hours.
Self-check exercises
1. A way in OSM doesn't store coordinates directly. Where do they come from?
From the nodes the way references. A way is an ordered list of node IDs; the coordinates live on each node. When extracting geometry, tools must resolve node references — meaning you need the full node table or a pre-resolved geometry (as Geofabrik provides in -shp extracts).
2. Why use Overpass for small extracts and `.osm.pbf` for large ones?
Overpass queries hit a shared public server and are rate-limited; large queries time out. Geofabrik / Planet dumps are static files you download once and process locally — scalable but require storage and local tools. Rule of thumb: one-off or small-area queries → Overpass; country-scale or larger → PBF extracts.
3. Your product uses OSM data. What licensing obligations apply?
Attribution ("© OpenStreetMap contributors") and share-alike under ODbL. If your derived work includes substantial OSM content, you must license the derived data under ODbL. For commercial products the definition of "substantial" is interpreted conservatively — consult a lawyer for edge cases, but small geocoded features and short snippets are typically fine with attribution.
Summary
- OSM model: nodes, ways, relations; tags carry semantics.
- Pull data via Overpass (small), Geofabrik (regional), or Overture (modern).
- Quality varies by region — validate before relying on specific attributes.
- ODbL requires attribution and share-alike for substantial derived works.
Further reading
- wiki.openstreetmap.org — canonical reference for tags and tools.
- Haklay, M. (2010) — How good is volunteered geographical information?
- osmcode.org — documentation for osmium and related tools.
- overturemaps.org — modern OSM-derived data distribution.