Module 3: Spatial Data Models

3.4 Vector vs Raster — When to Use Which

A practical decision guide for choosing between vector and raster representations.

Lesson 13 of 100·18 min read

Key takeaways

Vector excels for discrete features with well-defined boundaries; raster excels for continuous phenomena.

The right choice depends on the question being asked, not just the data at hand.

Many real workflows mix both — convert thoughtfully, aware of what's lost.

Introduction

You'll hear the debate "vector vs raster" so often it starts to sound like Coke vs Pepsi. It isn't — they're tools for different jobs. This lesson gives you a fast decision framework, a comparison table, and guidance for the inevitable vector/raster conversions.

Quick comparison

Criterion	Vector	Raster
Represents	Discrete features	Continuous surfaces
Geometry	Points, lines, polygons	Grid of cells
Resolution	Defined by vertex precision	Defined by cell size
File size (small features)	Smaller	Larger (empty cells)
File size (full-coverage imagery)	Larger (many polygons)	Smaller (dense grid is natural)
Accuracy	Arbitrary precision	Limited by cell size
Topology	First-class	Implicit
Common operations	Overlays, buffers, spatial joins	Map algebra, focal stats
Good at cartography for	Roads, parcels, points of interest	Hillshade, imagery, heatmaps
Typical formats	Shapefile, GeoJSON, GeoPackage	GeoTIFF, COG, NetCDF

When vector is the right choice

Features are discrete and have meaningful boundaries. Roads, buildings, parcels, administrative units.
Attributes vary per-feature, not per-location.
You need precise geometric queries (containment, adjacency, topology).
Data is sparse in space (a million points across a continent).
Storage must scale with feature complexity, not area.
You need to preserve exact coordinates from a survey or authoritative source.

When raster is the right choice

Data is a continuous surface — elevation, temperature, reflectance, population density.
Every cell has a value (or nodata).
You need fast cell-by-cell arithmetic (NDVI = (NIR − Red) / (NIR + Red)).
Resolution is uniform and known.
The analysis is inherently local-neighbourhood (slope, focal mean).

Grey areas

Some phenomena can be modelled either way. A land-cover map can be:

Vector — polygons with a class attribute. Compact for small areas, easy to edit manually, exact boundaries.
Raster — each cell carries a class code. Scales to large areas, fits well in raster-analytic pipelines.

Similarly population can be polygons (census tracts) or a raster surface (WorldPop-style 100 m grid). Pick based on downstream analysis.

Conversion pitfalls

Vector → Raster (rasterisation)

Aliasing: small polygons snap to cell boundaries; thin lines may disappear if they're narrower than a cell. Always set a resolution fine enough for the smallest feature you care about.

Shell

gdal_rasterize -a class -tr 10 10 -l landuse landuse.gpkg landuse.tif

Raster → Vector (polygonisation)

Contiguous same-valued cells become polygons. The result has cell-aligned (stair-step) boundaries, which can be unrealistic for natural features. Smoothing after vectorisation is common — use Douglas-Peucker with a small tolerance.

Shell

gdal_polygonize.py classified.tif landuse.gpkg

Performance considerations

Vector performance

Scales with feature count and vertex count, not with geographic extent.
A million point features is routine; a billion stretches even modern databases.
Spatial indexes (R-tree) make common queries O(log N).

Raster performance

Scales with total cell count (width × height × bands × time steps).
10 m resolution Europe-wide = ~100 billion cells per band. Plan storage carefully.
Use tiled/pyramided formats (COG), cloud-native access, and lazy computation (Dask, Xarray).

Storage: a rough cost comparison

A 1 km × 1 km area mapped at high fidelity:

Vector — 100 building polygons × ~20 vertices × 8 bytes ≈ 16 KB.
Raster (1 m cells, 8-bit) — 1 000 × 1 000 cells ≈ 1 MB per band.

For a 10 km × 10 km area:

Vector — ~10 000 buildings × 20 vertices × 8 bytes ≈ 1.6 MB.
Raster (1 m cells, 8-bit) — 10 000 × 10 000 cells ≈ 100 MB.

Vector wins for sparse features; raster wins for true full-coverage surfaces.

Common analysis patterns mixing both

Zonal statistics. Vector polygons + raster values → per-polygon mean/sum/majority. (Module 10.3.)
Point sampling. Vector points + raster → sample raster at point coords.
Distance surfaces. Vector features → raster of distance to nearest feature.
Rasterised overlays. Convert vector categories to raster, then combine with other rasters via map algebra.
Classified imagery → vector polygons. Turn a land-cover classification into editable polygons for reporting.

The best analysts move fluidly between models; the worst insist their favourite does everything.

Decision checklist

When you get a new question:

Is the phenomenon discrete or continuous? → vector / raster.
How big is the area? → small favours vector, continental favours raster.
What downstream analyses? → overlays lean vector; neighbourhood operations lean raster.
Do I need exact source coordinates? → vector preserves them; raster snaps to grid.
What are my performance constraints? → sparse features favour vector; dense uniform data favours raster.

Self-check exercises

1. Would you store a world land-cover dataset at 100 m resolution as vector or raster?

Raster. At 100 m, a world raster is ~400 billion cells — huge, but tractable with tiled COGs. Converting to vector would produce hundreds of millions of polygons with stair-stepped boundaries — slower to query and bulkier despite being "simpler" in the abstract.

2. A drone produces a 2 cm resolution orthophoto of a 5 ha field. How much does that raster weigh (8-bit, 3 bands)?

5 ha = 50 000 m²; at 2 cm × 2 cm, that's 50 000 / 0.0004 = 125 million cells per band; × 3 bands × 1 byte = 375 MB uncompressed. Compression (LZW, JPEG in GeoTIFF) typically halves or quarters this. Still large — plan storage and cloud transfer accordingly.

3. You want to compute "average elevation per watershed" from a DEM and a watershed polygon layer. Which analysis pattern is that?

Zonal statistics — a classic vector/raster hybrid operation. Each polygon becomes a zone; the raster provides per-cell values; the result is one statistic per polygon. Available in QGIS, rasterstats (Python), and ST_SummaryStats (PostGIS raster).

Summary

Vector for discrete features; raster for continuous surfaces.
The question drives the model more than the data source does.
Conversions lose information — be explicit about what's lost and why.
Real workflows blend both; zonal statistics is the most common bridge.