8.3 SQL for GIS
The SQL every GIS analyst should know — filtering, joining, aggregating, with spatial flavour.
Key takeaways
- SQL is the lingua franca of GIS analytics once your data is in a database.
- The core is
SELECT ... FROM ... WHERE ... GROUP BY ... JOIN; PostGIS addsST_*spatial functions.- Writing readable SQL pays off — rebuilding a tangled query costs more than writing it carefully the first time.
Introduction
SQL has been the dominant data language since 1974. For GIS, it pairs naturally with PostGIS to handle every query type — attribute filter, join, aggregate, spatial predicate. This lesson is a fast tour for readers new to SQL, using examples that reflect real analyst workloads.
The five building blocks
SELECT — which columns?
SELECT name, population FROM cities;Use * to get everything; prefer explicit columns in production code (performance + readability).
WHERE — filter rows
1SELECT name FROM cities WHERE population > 100000;
2SELECT * FROM parks WHERE country = 'Denmark' AND area_ha >= 5;ORDER BY — sort
1SELECT name, population FROM cities
2ORDER BY population DESC
3LIMIT 10;GROUP BY + aggregate — summarise
1SELECT country, COUNT(*) AS n_cities, AVG(population) AS avg_pop
2FROM cities
3GROUP BY country
4ORDER BY n_cities DESC;Aggregates: COUNT, SUM, AVG, MIN, MAX, STDDEV, PERCENTILE_CONT.
JOIN — combine tables
1SELECT c.name, co.continent
2FROM cities c
3JOIN countries co ON c.iso3 = co.iso3
4WHERE co.continent = 'Africa';Types:
INNER JOIN— matching rows from both.LEFT JOIN— all from left, matching from right (NULL otherwise).RIGHT JOIN— mirror.FULL OUTER JOIN— all from both.CROSS JOIN— cartesian; dangerous on large tables.
Spatial flavour
PostGIS adds ~300 ST_* functions. The essentials:
Geometry constructors
1ST_GeomFromText('POINT(-73.96 40.78)', 4326)
2ST_MakePoint(-73.96, 40.78)
3ST_GeomFromGeoJSON('{"type":"Point","coordinates":[-73.96,40.78]}')Spatial predicates (return boolean)
1ST_Intersects(a.geom, b.geom)
2ST_Contains(a.geom, b.geom)
3ST_Within(a.geom, b.geom)
4ST_DWithin(a.geom, b.geom, 500) -- within 500 units
5ST_Touches(a.geom, b.geom)
6ST_Crosses(a.geom, b.geom)Geometry manipulations (return geometry)
1ST_Buffer(geom, 100)
2ST_Union(geom) -- aggregate dissolve
3ST_Intersection(a.geom, b.geom)
4ST_Difference(a.geom, b.geom)
5ST_Simplify(geom, 10)
6ST_Centroid(geom)Measurements (return numeric)
1ST_Area(geom::geography) -- m²
2ST_Length(geom::geography) -- metres
3ST_Distance(a.geom::geography, b.geom::geography)
4ST_Perimeter(geom::geography)Cast to ::geography for metre-accurate calculations on lat/lon data.
Practical queries
Points in polygons
1SELECT d.name, COUNT(*) AS n_trees
2FROM districts d
3LEFT JOIN trees t ON ST_Within(t.geom, d.geom)
4GROUP BY d.name
5ORDER BY n_trees DESC;Nearest neighbour
1-- Closest hospital to each school
2SELECT s.id, s.name AS school,
3 h.name AS nearest_hospital,
4 ST_Distance(s.geom::geography, h.geom::geography) AS metres
5FROM schools s
6CROSS JOIN LATERAL (
7 SELECT name, geom
8 FROM hospitals
9 ORDER BY s.geom <-> geom -- bounding-box distance (indexed)
10 LIMIT 1
11) h;Aggregate within a buffer
1SELECT s.name,
2 COUNT(c.id) AS cafes_within_500m
3FROM stations s
4LEFT JOIN cafes c
5 ON ST_DWithin(s.geom::geography, c.geom::geography, 500)
6GROUP BY s.id, s.name;Dissolve by category
1SELECT landuse_class, ST_Union(geom) AS geom
2FROM land_parcels
3GROUP BY landuse_class;Common table expressions (CTEs)
WITH clauses let you break queries into named steps:
1WITH risk_areas AS (
2 SELECT ST_Buffer(geom, 500) AS geom
3 FROM hazardous_waste_sites
4),
5risk_population AS (
6 SELECT SUM(population) AS pop
7 FROM census_blocks b
8 JOIN risk_areas r ON ST_Intersects(b.geom, r.geom)
9)
10SELECT pop FROM risk_population;Readable, debuggable, and often no slower than nested subqueries.
Window functions
Compute ranks, running sums, and moving averages:
1SELECT city,
2 population,
3 RANK() OVER (PARTITION BY country ORDER BY population DESC) AS rank_in_country
4FROM cities;Explain and optimise
Before running a long query, use EXPLAIN ANALYZE to see the plan:
1EXPLAIN ANALYZE
2SELECT * FROM parks WHERE ST_Intersects(geom, ST_MakeEnvelope(...));Look for Index Scan (good) vs Seq Scan (full scan — suspect on large tables).
Readability matters
A 50-line SQL query hidden in application code is painful. Conventions:
- UPPER CASE for SQL keywords.
- Columns in multi-line lists for readability.
- CTEs to name intermediate results.
- Comments where logic is non-obvious.
- One statement per file for complex ETL.
Future you will thank you.
Self-check exercises
1. Why cast to ::geography for distance calculations on lat/lon data?
Geometry operations on lat/lon interpret coordinates as planar and return degrees. Geography operations use spherical (ellipsoidal) distance algorithms and return metres. A query like ST_Distance(geom, geom2) without the cast returns degrees, which is almost never what you want. ::geography is the fix; for very high volumes, consider reprojecting to a metric CRS instead to avoid the per-query cost.
2. Write an SQL query that counts points within each polygon.
1SELECT d.id, d.name, COUNT(p.id) AS n_points
2FROM districts d
3LEFT JOIN points p ON ST_Within(p.geom, d.geom)
4GROUP BY d.id, d.name
5ORDER BY n_points DESC;Left join preserves polygons with zero points; inner join drops them.
3. Your query does a sequential scan instead of using a spatial index. What might be wrong?
Common causes: (1) no spatial index created — CREATE INDEX ... USING GIST (geom). (2) a function wraps the geometry (ST_Buffer(geom, ...)), preventing index use — move the buffer into a CTE. (3) statistics are outdated — run VACUUM ANALYZE. (4) query touches the whole dataset anyway; sometimes a seq scan is genuinely faster.
Summary
- SQL core: SELECT, WHERE, ORDER BY, GROUP BY, JOIN.
- PostGIS adds
ST_*for geometry, predicates, measurements. - Use CTEs for readable multi-step queries.
EXPLAIN ANALYZEdiagnoses performance.
Further reading
- PostGIS in Action, 3rd edition.
- PostgreSQL Tutorial (postgresqltutorial.com).
explain.depesz.comfor visualising query plans.- SQL Antipatterns (Karwin) — common mistakes and fixes.