CoursesGIS Basics — A Complete Introduction8.2 Relational Databases for GIS
Module 8: Attribute Data & Databases

8.2 Relational Databases for GIS

Why GIS eventually moves from files to databases — and the principles that make the move worthwhile.

Lesson 39 of 100·18 min read

Key takeaways

  • Databases scale beyond single-user shapefile / GeoPackage workflows.
  • Tables, keys, normalisation, and transactions are the fundamentals every spatial database uses.
  • PostgreSQL + PostGIS is the de facto open-source spatial database; SQL Server Spatial and Oracle Spatial are commercial alternatives.

Introduction

When you outgrow single-user files, the next step is a relational database. This lesson covers the fundamentals — what relational databases are, how they relate to files, and why PostGIS has become the default for spatial data.

What's a relational database?

A relational database stores data in tables, with rows and columns, and lets you query across tables using SQL. Key properties:

  • Schema — tables have defined columns and types.
  • Constraints — keys, not-null, check constraints.
  • Transactions — groups of changes succeed or fail atomically (ACID).
  • Concurrency — many users can read and write simultaneously.
  • Indexes — speed queries.
  • Views — virtual tables defined by queries.
  • Stored procedures / functions — custom logic in the database.

Files vs databases — when to switch

FactorFiles (shp / GPKG)Database
Users1–2Many
Data size< 2 GB (shp), < 50 GB (GPKG)Terabytes
ConcurrencySingle writerMany writers
BackupsCopy the fileContinuous, point-in-time
Cross-table queriesLimitedRich SQL joins
SecurityFilesystemRow/column-level permissions

Rule of thumb: a solo analyst on a small dataset is fine with files. A team, public-facing app, or > 50 GB dataset benefits from a database.

Normalisation

Good database design avoids duplicating data. The classic example:

Denormalised (bad):

event_idcountry_namecountry_iso
1DenmarkDK
2DenmarkDK
3DenmarkDK

Normalised (good):

events(event_id, country_iso) + countries(country_iso, country_name).

Normalisation prevents update anomalies (changing Denmark's name requires one update, not many) and shrinks storage. Over-normalisation slows queries; there's a balance.

Keys again

A primary key uniquely identifies each row. A foreign key references a primary key in another table. The database enforces these — you can't insert an event for a country that doesn't exist.

Transactions

A transaction groups multiple statements. Either all succeed or all fail — no partial writes. Example:

SQL
1BEGIN;
2UPDATE accounts SET balance = balance - 100 WHERE id = 1;
3UPDATE accounts SET balance = balance + 100 WHERE id = 2;
4COMMIT;

If the power cuts between the two updates, the database rolls back the first and you don't end up missing £100.

For GIS: if an edit changes both geometry and attributes across multiple tables, you want them all committed atomically.

Indexes, revisited

A spatial index (GiST in PostGIS, R-tree in SpatiaLite) accelerates spatial predicates (ST_Intersects, ST_DWithin). Create one per geometry column:

SQL
CREATE INDEX idx_parks_geom ON parks USING GIST (geom);

Without this, a query scanning millions of features is O(N); with it, O(log N) for common queries.

PostgreSQL + PostGIS

PostgreSQL is a rock-solid open-source relational database. PostGIS is its spatial extension, adding:

  • Geometry and geography types.
  • ~300 spatial functions (ST_*).
  • Spatial indexes.
  • Coordinate transformation.
  • Raster support (via postgis_raster).
  • Topology extension.

PostGIS is the default spatial database for most professional GIS teams. It's free, scales to petabytes, and integrates with every major tool.

Commercial alternatives

  • Oracle Spatial — enterprise, strong geodetic support.
  • SQL Server Spatial — tight Windows/.NET integration.
  • MySQL spatial — basic features; weaker than PostGIS.
  • SpatiaLite — single-file, covered in Module 6.3.
  • DuckDB spatial — analytics-focused, embeddable, growing fast.

Cloud spatial databases

  • Amazon Aurora PostgreSQL + PostGIS.
  • Google Cloud SQL for PostgreSQL + PostGIS.
  • Azure Database for PostgreSQL + PostGIS.
  • BigQuery GEOGRAPHY type — serverless, great for massive aggregations.
  • Snowflake GEOGRAPHY / GEOMETRY — similar.

For serverless analytics on huge datasets, column-store warehouses (BigQuery, Snowflake) often beat row-store PostGIS.

A tiny PostGIS example

SQL
1-- Create a spatial table
2CREATE TABLE parks (
3  id SERIAL PRIMARY KEY,
4  name TEXT NOT NULL,
5  geom GEOMETRY(Polygon, 4326) NOT NULL
6);
7[object Object]
8[object Object]
9

Four lines of SQL do a real spatial query — what used to require specialised GIS software.

Migrating from files to PostGIS

Shell
1# Bulk import a shapefile into PostGIS
2ogr2ogr -f PostgreSQL \
3  PG:"host=db.example.com user=gis dbname=gisdb password=XXX" \
4  parcels.shp \
5  -nln parcels -overwrite -lco GEOMETRY_NAME=geom

One line, shapefile → PostGIS. Run a CREATE INDEX, VACUUM, and you're ready to query.

Self-check exercises

1. You have a shapefile with 500 000 parcels and one analyst editing. Should you move to PostGIS?

Probably not yet. One analyst on 500 k features is comfortable with a GeoPackage; you don't need concurrent writers or ACID across tables. Move to PostGIS when you add a second editor, start integrating web apps that need the data live, or cross 10 M features / multi-GB sizes.

2. Why is a spatial index important for spatial queries?

Without it, a query like "which parks are within 500 m of this school" must scan every park feature — O(N). A GiST / R-tree index narrows the candidates by bounding-box pre-filter to a tiny fraction, making the query O(log N). For 1 M features, this is the difference between seconds and milliseconds.

3. What's the difference between ACID transactions and eventual consistency?

ACID (Atomicity, Consistency, Isolation, Durability) transactions guarantee all-or-nothing semantics and immediate visibility of committed data — PostGIS provides this. Eventual consistency (common in distributed / NoSQL systems) allows temporary divergence between nodes, with convergence over time; faster but harder to reason about for correctness-critical data like cadastral records.

Summary

  • Files for solo work; databases for teams, scale, and concurrency.
  • PostgreSQL + PostGIS is the default open-source spatial database.
  • Normalisation, indexes, and transactions make databases reliable.
  • Cloud databases and warehouses add elastic scale and serverless options.

Further reading

  • PostGIS in Action, 3rd edition (Obe, Hsu).
  • PostgreSQL documentation — tutorial chapters.
  • Codd, E. F. — A Relational Model of Data for Large Shared Data Banks (1970).
  • Designing Data-Intensive Applications (Kleppmann) for system-level context.