3.2 The Raster Data Model
Cells, bands, nodata, and affine transforms — how GIS represents continuous surfaces.
Key takeaways
- A raster is a regular grid of cells, each carrying a numeric value.
- Geographic placement is encoded in an affine transform mapping pixel coordinates to world coordinates.
- Multiband rasters power remote sensing; nodata and bit depth choices affect everything downstream.
Introduction
Where vectors excel at discrete features with well-defined boundaries, rasters are the natural model for continuous phenomena — elevation, temperature, satellite reflectance, population density surfaces. A raster imposes a grid on the world and records a value per cell. Understanding how that grid relates to Earth coordinates is the key to working with rasters safely.
Anatomy of a raster
A single-band raster has four parts:
- Array of values — a 2D grid of numbers, one per cell (aka pixel).
- Affine transform — six numbers that map pixel (row, column) to world (x, y).
- CRS — the coordinate reference system of the world coordinates.
- Nodata value — a sentinel marker for cells where no data exists.
Multiband rasters add more bands — for example, a Sentinel-2 scene has 13 bands (blue, green, red, near-infrared, etc.), each an independent array sharing the same grid.
The affine transform
Six numbers — a, b, c, d, e, f — define a linear mapping from pixel coordinates to world coordinates:
1world_x = a + b * col + c * row
2world_y = d + e * col + f * rowFor north-up, unrotated rasters (the overwhelming majority):
a= x of upper-left cornerb= pixel width (cell x-size)c= 0 (no rotation)d= y of upper-left cornere= 0 (no rotation)f= −pixel height (negative because rows increase downward)
Example:
1Affine(30.0, 0.0, 450000.0,
2 0.0, -30.0, 4200000.0)A 30 m Sentinel-2-like raster whose upper-left corner is at UTM (450000, 4200000).
Spatial resolution
One cell's ground size — e.g., 30 m × 30 m for Landsat, 10 m × 10 m for Sentinel-2 RGB/NIR, 3 m for Planet, 31 cm for Worldview-3. Coarser resolution = bigger cells = less detail.
The resolution need not be square (1 m × 2 m is rare but valid). Always check with gdalinfo or the raster's metadata.
Bit depth and data types
Each cell stores a value; how many bits per cell determines the range and memory cost:
| Data type | Range | Common uses |
|---|---|---|
| 8-bit unsigned (UInt8) | 0–255 | Natural-colour imagery, class maps |
| 16-bit unsigned (UInt16) | 0–65 535 | Sentinel-2 reflectance |
| 32-bit integer (Int32) | −2.1 × 10⁹ – +2.1 × 10⁹ | Identifier rasters |
| 32-bit float (Float32) | ≈ ±3.4 × 10³⁸ | DEMs, NDVI, temperature |
| 64-bit float (Float64) | ≈ ±1.8 × 10³⁰⁸ | Scientific computation |
A 10 000 × 10 000 Float32 raster takes ~400 MB in memory. Switching to Int16 halves that; to Int8 quarters it. When the physical quantity tolerates it, a smaller type saves storage, I/O, and computation.
Nodata
Real rasters have missing cells — outside the satellite swath, behind clouds, in the ocean when mapping land only. Rather than guess, GIS marks those cells with a reserved nodata value (e.g., -9999 for a DEM, 0 for imagery, or a dedicated mask band).
Always check a raster's nodata before running statistics. A mean elevation that quietly includes -9999 cells is wildly wrong.
Multiband rasters and remote sensing
A Sentinel-2 tile has 13 bands at three resolutions:
| Band | Name | Wavelength | Native resolution |
|---|---|---|---|
| B02 | Blue | 490 nm | 10 m |
| B03 | Green | 560 nm | 10 m |
| B04 | Red | 665 nm | 10 m |
| B08 | NIR | 842 nm | 10 m |
| B05, B06, B07, B8A, B11, B12 | Red-edge, SWIR | various | 20 m |
| B01, B09, B10 | Atmospheric | various | 60 m |
Combining bands with arithmetic produces indices (Module 14.4). Displaying bands 4, 3, 2 as red/green/blue gives a natural-colour image; displaying 8, 4, 3 gives a false-colour infrared composite.
Tiling, pyramids, and chunks
A large raster is stored in tiles (e.g., 512 × 512 pixel blocks). This lets software read a region of interest without loading the whole file. Downsampled overviews (pyramids) provide quick zoomed-out views. Cloud Optimised GeoTIFFs (Module 5.5) formalise this so a client can stream only the tiles it needs.
Chunking also enables parallel processing — each tile can be read, transformed, and written independently.
Raster arithmetic
Because rasters are arrays, you operate on them with element-wise arithmetic — map algebra (Module 10).
1import rasterio
2import numpy as np
3[object Object]
4[object Object]
5[object Object]
6The result shares the same grid, CRS, and transform as its inputs (assuming you aligned them first).
Alignment — the silent bug factory
Two rasters that look similar on a map may not be cell-by-cell aligned. They might have:
- Different CRSs.
- Different origin coordinates.
- Different resolutions.
- Different nodata values.
Before combining, resample to a common grid. Nearest-neighbour resampling preserves categorical values (land-cover classes); bilinear or cubic resampling is smoother for continuous data (elevation, temperature).
gdalwarp -t_srs EPSG:3857 -te 0 0 1000 1000 -tr 10 10 input.tif output.tif is the standard command.
Vector-raster interplay
Most real analyses mix both models:
- Rasterising vectors — burn a polygon into a raster (e.g., convert a land-use polygon layer to a categorical raster). Use
gdal_rasterize. - Vectorising rasters — polygon-ise contiguous cells with the same value (e.g., extract deforestation patches from a classified raster). Use
gdal_polygonize.
The vector ↔ raster conversion is lossy in subtle ways: aliasing at cell edges, fragmenting along cell grids. Understand what's lost.
Self-check exercises
1. Why is the pixel height in an affine transform often negative?
Because raster row indices increase downward while world y-coordinates typically increase upward (north). A negative pixel height makes the transform flip the y-axis correctly, so row 0 corresponds to the northernmost row.
2. Two DEMs cover the same area but won't subtract cleanly. What are three likely reasons?
(1) Different CRSs — reproject one to the other's CRS. (2) Different resolutions — resample to a common cell size. (3) Different origins — even with the same CRS and resolution, offset origins cause misalignment. Use gdalwarp or rasterio.warp.reproject with target extent and resolution.
3. You compute a mean elevation from a DEM and get 4 300 m for a region you know is around sea level. What happened?
Almost certainly the mean included nodata sentinels (often -9999 or 32767). Always mask nodata before statistics. In Python: np.mean(arr[arr != nodata]) or use np.ma.masked_where. Tools like rasterio handle this with masked=True.
Summary
- A raster = array + affine transform + CRS + nodata.
- Resolution, bit depth, and nodata are design choices with big consequences.
- Multiband rasters power remote sensing; arithmetic across bands is the core of index calculations.
- Alignment (CRS, extent, resolution) is a precondition for combining rasters.
Further reading
- Rasterio documentation — conceptual and API reference.
- GDAL tutorial on the geotransform.
- Gorelick et al., Google Earth Engine: Planetary-scale geospatial analysis for everyone.
- Hengl, T. — Practical Guide to Geostatistical Mapping (open).