8.2 Relational Databases for GIS
Why GIS eventually moves from files to databases — and the principles that make the move worthwhile.
Key takeaways
- Databases scale beyond single-user shapefile / GeoPackage workflows.
- Tables, keys, normalisation, and transactions are the fundamentals every spatial database uses.
- PostgreSQL + PostGIS is the de facto open-source spatial database; SQL Server Spatial and Oracle Spatial are commercial alternatives.
Introduction
When you outgrow single-user files, the next step is a relational database. This lesson covers the fundamentals — what relational databases are, how they relate to files, and why PostGIS has become the default for spatial data.
What's a relational database?
A relational database stores data in tables, with rows and columns, and lets you query across tables using SQL. Key properties:
- Schema — tables have defined columns and types.
- Constraints — keys, not-null, check constraints.
- Transactions — groups of changes succeed or fail atomically (ACID).
- Concurrency — many users can read and write simultaneously.
- Indexes — speed queries.
- Views — virtual tables defined by queries.
- Stored procedures / functions — custom logic in the database.
Files vs databases — when to switch
| Factor | Files (shp / GPKG) | Database |
|---|---|---|
| Users | 1–2 | Many |
| Data size | < 2 GB (shp), < 50 GB (GPKG) | Terabytes |
| Concurrency | Single writer | Many writers |
| Backups | Copy the file | Continuous, point-in-time |
| Cross-table queries | Limited | Rich SQL joins |
| Security | Filesystem | Row/column-level permissions |
Rule of thumb: a solo analyst on a small dataset is fine with files. A team, public-facing app, or > 50 GB dataset benefits from a database.
Normalisation
Good database design avoids duplicating data. The classic example:
Denormalised (bad):
| event_id | country_name | country_iso |
|---|---|---|
| 1 | Denmark | DK |
| 2 | Denmark | DK |
| 3 | Denmark | DK |
Normalised (good):
events(event_id, country_iso) + countries(country_iso, country_name).
Normalisation prevents update anomalies (changing Denmark's name requires one update, not many) and shrinks storage. Over-normalisation slows queries; there's a balance.
Keys again
A primary key uniquely identifies each row. A foreign key references a primary key in another table. The database enforces these — you can't insert an event for a country that doesn't exist.
Transactions
A transaction groups multiple statements. Either all succeed or all fail — no partial writes. Example:
1BEGIN;
2UPDATE accounts SET balance = balance - 100 WHERE id = 1;
3UPDATE accounts SET balance = balance + 100 WHERE id = 2;
4COMMIT;If the power cuts between the two updates, the database rolls back the first and you don't end up missing £100.
For GIS: if an edit changes both geometry and attributes across multiple tables, you want them all committed atomically.
Indexes, revisited
A spatial index (GiST in PostGIS, R-tree in SpatiaLite) accelerates spatial predicates (ST_Intersects, ST_DWithin). Create one per geometry column:
CREATE INDEX idx_parks_geom ON parks USING GIST (geom);Without this, a query scanning millions of features is O(N); with it, O(log N) for common queries.
PostgreSQL + PostGIS
PostgreSQL is a rock-solid open-source relational database. PostGIS is its spatial extension, adding:
- Geometry and geography types.
- ~300 spatial functions (
ST_*). - Spatial indexes.
- Coordinate transformation.
- Raster support (via
postgis_raster). - Topology extension.
PostGIS is the default spatial database for most professional GIS teams. It's free, scales to petabytes, and integrates with every major tool.
Commercial alternatives
- Oracle Spatial — enterprise, strong geodetic support.
- SQL Server Spatial — tight Windows/.NET integration.
- MySQL spatial — basic features; weaker than PostGIS.
- SpatiaLite — single-file, covered in Module 6.3.
- DuckDB spatial — analytics-focused, embeddable, growing fast.
Cloud spatial databases
- Amazon Aurora PostgreSQL + PostGIS.
- Google Cloud SQL for PostgreSQL + PostGIS.
- Azure Database for PostgreSQL + PostGIS.
- BigQuery GEOGRAPHY type — serverless, great for massive aggregations.
- Snowflake GEOGRAPHY / GEOMETRY — similar.
For serverless analytics on huge datasets, column-store warehouses (BigQuery, Snowflake) often beat row-store PostGIS.
A tiny PostGIS example
1-- Create a spatial table
2CREATE TABLE parks (
3 id SERIAL PRIMARY KEY,
4 name TEXT NOT NULL,
5 geom GEOMETRY(Polygon, 4326) NOT NULL
6);
7[object Object]
8[object Object]
9Four lines of SQL do a real spatial query — what used to require specialised GIS software.
Migrating from files to PostGIS
1# Bulk import a shapefile into PostGIS
2ogr2ogr -f PostgreSQL \
3 PG:"host=db.example.com user=gis dbname=gisdb password=XXX" \
4 parcels.shp \
5 -nln parcels -overwrite -lco GEOMETRY_NAME=geomOne line, shapefile → PostGIS. Run a CREATE INDEX, VACUUM, and you're ready to query.
Self-check exercises
1. You have a shapefile with 500 000 parcels and one analyst editing. Should you move to PostGIS?
Probably not yet. One analyst on 500 k features is comfortable with a GeoPackage; you don't need concurrent writers or ACID across tables. Move to PostGIS when you add a second editor, start integrating web apps that need the data live, or cross 10 M features / multi-GB sizes.
2. Why is a spatial index important for spatial queries?
Without it, a query like "which parks are within 500 m of this school" must scan every park feature — O(N). A GiST / R-tree index narrows the candidates by bounding-box pre-filter to a tiny fraction, making the query O(log N). For 1 M features, this is the difference between seconds and milliseconds.
3. What's the difference between ACID transactions and eventual consistency?
ACID (Atomicity, Consistency, Isolation, Durability) transactions guarantee all-or-nothing semantics and immediate visibility of committed data — PostGIS provides this. Eventual consistency (common in distributed / NoSQL systems) allows temporary divergence between nodes, with convergence over time; faster but harder to reason about for correctness-critical data like cadastral records.
Summary
- Files for solo work; databases for teams, scale, and concurrency.
- PostgreSQL + PostGIS is the default open-source spatial database.
- Normalisation, indexes, and transactions make databases reliable.
- Cloud databases and warehouses add elastic scale and serverless options.
Further reading
- PostGIS in Action, 3rd edition (Obe, Hsu).
- PostgreSQL documentation — tutorial chapters.
- Codd, E. F. — A Relational Model of Data for Large Shared Data Banks (1970).
- Designing Data-Intensive Applications (Kleppmann) for system-level context.