Module 7: Data Sources & Acquisition

7.1 Open Data Portals

Where to find free, authoritative spatial data — and how to vet it before you use it.

Lesson 33 of 100·18 min read

Key takeaways

  • Government open-data portals are the most reliable free sources for authoritative spatial data.
  • Each portal has a distinct vocabulary, licence, and API style — learn the top 5 well.
  • Always record vintage, licence, and provenance when you reuse data.

Introduction

Finding the right data is half the job in any GIS project. Open data portals have transformed what's possible — whole countries publish their cadastres, hydrography, and transport networks for free. This lesson surveys the portals you'll use most often.

National and supranational portals

United States

  • data.gov — federal aggregator with tens of thousands of spatial datasets.
  • USGS National Map — topography, hydrography, land cover, elevation (SRTM, 3DEP).
  • US Census TIGER/Line — boundaries and demographics.
  • NOAA Data Access Viewer — weather, oceans, LiDAR, shoreline.
  • FEMA Flood Map Service — flood insurance rate maps.

European Union

  • INSPIRE Geoportal — harmonised metadata across EU member states.
  • Copernicus Open Access Hub — Sentinel satellite imagery, Copernicus services data.
  • EEA (European Environment Agency) — environmental datasets.
  • Eurostat GISCO — administrative boundaries, statistical layers.

United Kingdom

  • data.gov.uk — aggregator.
  • Ordnance Survey OpenData — OS Open Zoomstack, Boundary-Line, OS Terrain 50.
  • London Datastore — city-level open data for Greater London.

Other notable portals

  • Canada Open Maps (open.canada.ca).
  • data.gov.au (Australia).
  • data.gov.sg (Singapore).
  • data.gov.in (India).
  • Geoportail.gouv.fr (France).
  • geodaten-online.bund.de (Germany).

Global / thematic sources

Satellite & Earth observation

  • USGS EarthExplorer — Landsat, ASTER, SRTM, declassified imagery.
  • ESA Copernicus Data Space — Sentinel-1, -2, -3, -5P.
  • NASA Earthdata Search — MODIS, VIIRS, ICESat-2, ASTER.
  • Google Earth Engine Data Catalog — petabyte-scale analysis-ready archive.
  • Microsoft Planetary Computer — STAC-indexed archives, free compute.

Vector / global thematic

  • OpenStreetMap — the world's biggest crowdsourced map.
  • Natural Earth — free physical / cultural vectors at three scales.
  • GADM — administrative boundaries, free for academic use.
  • WorldPop — global population rasters.
  • Global Forest Watch — deforestation and land-cover change.
  • GEBCO — global bathymetry.

Climate & environment

  • ERA5 (ECMWF / Copernicus) — hourly weather reanalysis.
  • WorldClim — long-term climate summaries.
  • Soilgrids — global soil properties.

STAC — a unifying catalog standard

Many modern archives expose a SpatioTemporal Asset Catalog (STAC) API. Instead of provider-specific search, you use a standard JSON-based query:

Python
from pystac_client import Client

STAC is to Earth observation what DNS was to the early internet — an interoperable standard that collapses 50 different portals into one API.

Licence literacy

Not all "open data" is equally open. Common licences:

  • Public domain / CC0 — use for anything.
  • CC-BY — attribution required.
  • CC-BY-SA — attribution + share-alike (OSM uses ODbL which is similar).
  • Open Government Licence (OGL) — UK / Canada variants; attribution.
  • USGS/NOAA — public domain by default.
  • Non-commercial — can't use in paid products.

Always read the licence. A colleague's "I grabbed this from a portal" without licence tracking is a risk.

Vintage and provenance

Every dataset has a vintage — when the data was collected or last updated. A road network from 2015 won't show 2024's cycle lanes. A population dataset from 2018 predates COVID-19 migration shifts.

Record, in your project notes:

  • Source URL.
  • Dataset title and version.
  • Vintage / acquisition date.
  • Licence.
  • Pre-processing applied (reprojection, filtering, joins).

Without this, your analysis isn't reproducible.

Metadata standards

Formal portals use metadata standards — ISO 19115, INSPIRE, FGDC. These specify mandatory fields (lineage, accuracy, contact). Download the XML metadata alongside the data when available.

Preparing open data for use

Standard preflight:

  1. Read the metadata.
  2. Inspect with gdalinfo / ogrinfo.
  3. Reproject to your working CRS.
  4. Clip to your area of interest.
  5. Drop unneeded attributes.
  6. Document the transformation.
  7. Save locally in a modern format (GeoPackage, GeoParquet).

This short workflow saves hours downstream.

Self-check exercises

1. You need US county boundaries. What's the authoritative source?

US Census TIGER/Line. It's the official boundary dataset used by the Census Bureau and is updated annually. Avoid unofficial reposts unless a licensed provider publishes them with clear lineage back to TIGER.

2. What's a STAC API and why would you use one?

A STAC (SpatioTemporal Asset Catalog) API is a standardised JSON-over-HTTP interface for searching Earth-observation archives. Instead of learning a custom API per provider, you issue a bbox + datetime + collections query to any STAC endpoint (USGS, Microsoft, AWS, ESA). It drastically simplifies multi-provider pipelines.

3. A colleague shares a dataset from "some government portal" with no licence or vintage noted. What should you do?

Don't use it in client-facing work until you trace the source, licence, and date. Download from the original portal with metadata. Document provenance. A dataset without licence + vintage is a reproducibility and compliance risk — small investment now saves big cleanup later.

Summary

  • Top national portals (US, EU, UK) plus global archives (STAC-indexed) cover most analytic needs.
  • Record vintage, licence, and provenance for every dataset.
  • STAC unifies Earth-observation search across providers.
  • Spend time on preflight — reproject, clip, reduce, document — before analysis begins.

Further reading

  • STAC Specification (stacspec.org).
  • OpenDataWatch — global index of open data programmes.
  • Creative Commons licence guides.
  • ISO 19115 metadata standard overview.