CoursesGIS Basics — A Complete Introduction3.2 The Raster Data Model
Module 3: Spatial Data Models

3.2 The Raster Data Model

Cells, bands, nodata, and affine transforms — how GIS represents continuous surfaces.

Lesson 11 of 100·22 min read

Key takeaways

  • A raster is a regular grid of cells, each carrying a numeric value.
  • Geographic placement is encoded in an affine transform mapping pixel coordinates to world coordinates.
  • Multiband rasters power remote sensing; nodata and bit depth choices affect everything downstream.

Introduction

Where vectors excel at discrete features with well-defined boundaries, rasters are the natural model for continuous phenomena — elevation, temperature, satellite reflectance, population density surfaces. A raster imposes a grid on the world and records a value per cell. Understanding how that grid relates to Earth coordinates is the key to working with rasters safely.

Anatomy of a raster

A single-band raster has four parts:

  1. Array of values — a 2D grid of numbers, one per cell (aka pixel).
  2. Affine transform — six numbers that map pixel (row, column) to world (x, y).
  3. CRS — the coordinate reference system of the world coordinates.
  4. Nodata value — a sentinel marker for cells where no data exists.

Multiband rasters add more bands — for example, a Sentinel-2 scene has 13 bands (blue, green, red, near-infrared, etc.), each an independent array sharing the same grid.

The affine transform

Six numbers — a, b, c, d, e, f — define a linear mapping from pixel coordinates to world coordinates:

Code
1world_x = a + b * col + c * row
2world_y = d + e * col + f * row

For north-up, unrotated rasters (the overwhelming majority):

  • a = x of upper-left corner
  • b = pixel width (cell x-size)
  • c = 0 (no rotation)
  • d = y of upper-left corner
  • e = 0 (no rotation)
  • f = −pixel height (negative because rows increase downward)

Example:

Code
1Affine(30.0, 0.0, 450000.0,
2        0.0, -30.0, 4200000.0)

A 30 m Sentinel-2-like raster whose upper-left corner is at UTM (450000, 4200000).

Spatial resolution

One cell's ground size — e.g., 30 m × 30 m for Landsat, 10 m × 10 m for Sentinel-2 RGB/NIR, 3 m for Planet, 31 cm for Worldview-3. Coarser resolution = bigger cells = less detail.

The resolution need not be square (1 m × 2 m is rare but valid). Always check with gdalinfo or the raster's metadata.

Bit depth and data types

Each cell stores a value; how many bits per cell determines the range and memory cost:

Data typeRangeCommon uses
8-bit unsigned (UInt8)0–255Natural-colour imagery, class maps
16-bit unsigned (UInt16)0–65 535Sentinel-2 reflectance
32-bit integer (Int32)−2.1 × 10⁹ – +2.1 × 10⁹Identifier rasters
32-bit float (Float32)≈ ±3.4 × 10³⁸DEMs, NDVI, temperature
64-bit float (Float64)≈ ±1.8 × 10³⁰⁸Scientific computation

A 10 000 × 10 000 Float32 raster takes ~400 MB in memory. Switching to Int16 halves that; to Int8 quarters it. When the physical quantity tolerates it, a smaller type saves storage, I/O, and computation.

Nodata

Real rasters have missing cells — outside the satellite swath, behind clouds, in the ocean when mapping land only. Rather than guess, GIS marks those cells with a reserved nodata value (e.g., -9999 for a DEM, 0 for imagery, or a dedicated mask band).

Always check a raster's nodata before running statistics. A mean elevation that quietly includes -9999 cells is wildly wrong.

Multiband rasters and remote sensing

A Sentinel-2 tile has 13 bands at three resolutions:

BandNameWavelengthNative resolution
B02Blue490 nm10 m
B03Green560 nm10 m
B04Red665 nm10 m
B08NIR842 nm10 m
B05, B06, B07, B8A, B11, B12Red-edge, SWIRvarious20 m
B01, B09, B10Atmosphericvarious60 m

Combining bands with arithmetic produces indices (Module 14.4). Displaying bands 4, 3, 2 as red/green/blue gives a natural-colour image; displaying 8, 4, 3 gives a false-colour infrared composite.

Tiling, pyramids, and chunks

A large raster is stored in tiles (e.g., 512 × 512 pixel blocks). This lets software read a region of interest without loading the whole file. Downsampled overviews (pyramids) provide quick zoomed-out views. Cloud Optimised GeoTIFFs (Module 5.5) formalise this so a client can stream only the tiles it needs.

Chunking also enables parallel processing — each tile can be read, transformed, and written independently.

Raster arithmetic

Because rasters are arrays, you operate on them with element-wise arithmetic — map algebra (Module 10).

Python
1import rasterio
2import numpy as np
3[object Object]
4[object Object]
5[object Object]
6

The result shares the same grid, CRS, and transform as its inputs (assuming you aligned them first).

Alignment — the silent bug factory

Two rasters that look similar on a map may not be cell-by-cell aligned. They might have:

  • Different CRSs.
  • Different origin coordinates.
  • Different resolutions.
  • Different nodata values.

Before combining, resample to a common grid. Nearest-neighbour resampling preserves categorical values (land-cover classes); bilinear or cubic resampling is smoother for continuous data (elevation, temperature).

gdalwarp -t_srs EPSG:3857 -te 0 0 1000 1000 -tr 10 10 input.tif output.tif is the standard command.

Vector-raster interplay

Most real analyses mix both models:

  • Rasterising vectors — burn a polygon into a raster (e.g., convert a land-use polygon layer to a categorical raster). Use gdal_rasterize.
  • Vectorising rasters — polygon-ise contiguous cells with the same value (e.g., extract deforestation patches from a classified raster). Use gdal_polygonize.

The vector ↔ raster conversion is lossy in subtle ways: aliasing at cell edges, fragmenting along cell grids. Understand what's lost.

Self-check exercises

1. Why is the pixel height in an affine transform often negative?

Because raster row indices increase downward while world y-coordinates typically increase upward (north). A negative pixel height makes the transform flip the y-axis correctly, so row 0 corresponds to the northernmost row.

2. Two DEMs cover the same area but won't subtract cleanly. What are three likely reasons?

(1) Different CRSs — reproject one to the other's CRS. (2) Different resolutions — resample to a common cell size. (3) Different origins — even with the same CRS and resolution, offset origins cause misalignment. Use gdalwarp or rasterio.warp.reproject with target extent and resolution.

3. You compute a mean elevation from a DEM and get 4 300 m for a region you know is around sea level. What happened?

Almost certainly the mean included nodata sentinels (often -9999 or 32767). Always mask nodata before statistics. In Python: np.mean(arr[arr != nodata]) or use np.ma.masked_where. Tools like rasterio handle this with masked=True.

Summary

  • A raster = array + affine transform + CRS + nodata.
  • Resolution, bit depth, and nodata are design choices with big consequences.
  • Multiband rasters power remote sensing; arithmetic across bands is the core of index calculations.
  • Alignment (CRS, extent, resolution) is a precondition for combining rasters.

Further reading

  • Rasterio documentation — conceptual and API reference.
  • GDAL tutorial on the geotransform.
  • Gorelick et al., Google Earth Engine: Planetary-scale geospatial analysis for everyone.
  • Hengl, T. — Practical Guide to Geostatistical Mapping (open).