7.1 Open Data Portals
Where to find free, authoritative spatial data — and how to vet it before you use it.
Key takeaways
- Government open-data portals are the most reliable free sources for authoritative spatial data.
- Each portal has a distinct vocabulary, licence, and API style — learn the top 5 well.
- Always record vintage, licence, and provenance when you reuse data.
Introduction
Finding the right data is half the job in any GIS project. Open data portals have transformed what's possible — whole countries publish their cadastres, hydrography, and transport networks for free. This lesson surveys the portals you'll use most often.
National and supranational portals
United States
- data.gov — federal aggregator with tens of thousands of spatial datasets.
- USGS National Map — topography, hydrography, land cover, elevation (SRTM, 3DEP).
- US Census TIGER/Line — boundaries and demographics.
- NOAA Data Access Viewer — weather, oceans, LiDAR, shoreline.
- FEMA Flood Map Service — flood insurance rate maps.
European Union
- INSPIRE Geoportal — harmonised metadata across EU member states.
- Copernicus Open Access Hub — Sentinel satellite imagery, Copernicus services data.
- EEA (European Environment Agency) — environmental datasets.
- Eurostat GISCO — administrative boundaries, statistical layers.
United Kingdom
- data.gov.uk — aggregator.
- Ordnance Survey OpenData — OS Open Zoomstack, Boundary-Line, OS Terrain 50.
- London Datastore — city-level open data for Greater London.
Other notable portals
- Canada Open Maps (open.canada.ca).
- data.gov.au (Australia).
- data.gov.sg (Singapore).
- data.gov.in (India).
- Geoportail.gouv.fr (France).
- geodaten-online.bund.de (Germany).
Global / thematic sources
Satellite & Earth observation
- USGS EarthExplorer — Landsat, ASTER, SRTM, declassified imagery.
- ESA Copernicus Data Space — Sentinel-1, -2, -3, -5P.
- NASA Earthdata Search — MODIS, VIIRS, ICESat-2, ASTER.
- Google Earth Engine Data Catalog — petabyte-scale analysis-ready archive.
- Microsoft Planetary Computer — STAC-indexed archives, free compute.
Vector / global thematic
- OpenStreetMap — the world's biggest crowdsourced map.
- Natural Earth — free physical / cultural vectors at three scales.
- GADM — administrative boundaries, free for academic use.
- WorldPop — global population rasters.
- Global Forest Watch — deforestation and land-cover change.
- GEBCO — global bathymetry.
Climate & environment
- ERA5 (ECMWF / Copernicus) — hourly weather reanalysis.
- WorldClim — long-term climate summaries.
- Soilgrids — global soil properties.
STAC — a unifying catalog standard
Many modern archives expose a SpatioTemporal Asset Catalog (STAC) API. Instead of provider-specific search, you use a standard JSON-based query:
from pystac_client import ClientSTAC is to Earth observation what DNS was to the early internet — an interoperable standard that collapses 50 different portals into one API.
Licence literacy
Not all "open data" is equally open. Common licences:
- Public domain / CC0 — use for anything.
- CC-BY — attribution required.
- CC-BY-SA — attribution + share-alike (OSM uses ODbL which is similar).
- Open Government Licence (OGL) — UK / Canada variants; attribution.
- USGS/NOAA — public domain by default.
- Non-commercial — can't use in paid products.
Always read the licence. A colleague's "I grabbed this from a portal" without licence tracking is a risk.
Vintage and provenance
Every dataset has a vintage — when the data was collected or last updated. A road network from 2015 won't show 2024's cycle lanes. A population dataset from 2018 predates COVID-19 migration shifts.
Record, in your project notes:
- Source URL.
- Dataset title and version.
- Vintage / acquisition date.
- Licence.
- Pre-processing applied (reprojection, filtering, joins).
Without this, your analysis isn't reproducible.
Metadata standards
Formal portals use metadata standards — ISO 19115, INSPIRE, FGDC. These specify mandatory fields (lineage, accuracy, contact). Download the XML metadata alongside the data when available.
Preparing open data for use
Standard preflight:
- Read the metadata.
- Inspect with
gdalinfo/ogrinfo. - Reproject to your working CRS.
- Clip to your area of interest.
- Drop unneeded attributes.
- Document the transformation.
- Save locally in a modern format (GeoPackage, GeoParquet).
This short workflow saves hours downstream.
Self-check exercises
1. You need US county boundaries. What's the authoritative source?
US Census TIGER/Line. It's the official boundary dataset used by the Census Bureau and is updated annually. Avoid unofficial reposts unless a licensed provider publishes them with clear lineage back to TIGER.
2. What's a STAC API and why would you use one?
A STAC (SpatioTemporal Asset Catalog) API is a standardised JSON-over-HTTP interface for searching Earth-observation archives. Instead of learning a custom API per provider, you issue a bbox + datetime + collections query to any STAC endpoint (USGS, Microsoft, AWS, ESA). It drastically simplifies multi-provider pipelines.
3. A colleague shares a dataset from "some government portal" with no licence or vintage noted. What should you do?
Don't use it in client-facing work until you trace the source, licence, and date. Download from the original portal with metadata. Document provenance. A dataset without licence + vintage is a reproducibility and compliance risk — small investment now saves big cleanup later.
Summary
- Top national portals (US, EU, UK) plus global archives (STAC-indexed) cover most analytic needs.
- Record vintage, licence, and provenance for every dataset.
- STAC unifies Earth-observation search across providers.
- Spend time on preflight — reproject, clip, reduce, document — before analysis begins.
Further reading
- STAC Specification (stacspec.org).
- OpenDataWatch — global index of open data programmes.
- Creative Commons licence guides.
- ISO 19115 metadata standard overview.