CoursesGIS Basics — A Complete Introduction18.2 Rasterio and NumPy for Raster Analysis
Module 18: Automation & Programming

18.2 Rasterio and NumPy for Raster Analysis

Python's raster toolkit — open, read, analyse, and write raster datasets.

Lesson 88 of 100·17 min read

Key takeaways

  • Rasterio wraps GDAL with a Python-friendly API.
  • NumPy arrays are the canonical in-memory raster representation.
  • Xarray + Zarr handle multi-dimensional and cloud-native rasters.

Introduction

If Module 18.1 handled vector, this lesson handles raster. Rasterio + NumPy is the default Python stack for raster analysis — fast, expressive, and compatible with the whole scientific Python ecosystem.

Rasterio basics

Python
1import rasterio
2import numpy as np
3[object Object]
4[object Object]
5

Reading a windowed region

Python
from rasterio.windows import Window

Efficient for huge rasters — only the bytes you need are read.

Reprojecting / warping

Python
1from rasterio.warp import calculate_default_transform, reproject, Resampling
2[object Object]
3[object Object]
4

Band math with NumPy

NDVI example:

Python
1with rasterio.open('red.tif') as r, rasterio.open('nir.tif') as n:
2    red = r.read(1).astype('float32')
3    nir = n.read(1).astype('float32')
4    profile = r.profile
5[object Object]
6

Masking and clipping

Python
1from rasterio.mask import mask
2import geopandas as gpd
3[object Object]
4[object Object]
5

Zonal statistics

Use rasterstats:

Python
from rasterstats import zonal_stats

Xarray — multi-dimensional rasters

For time-series or stacked bands, Xarray is preferred:

Python
1import xarray as xr
2import rioxarray
3[object Object]
4[object Object]
5[object Object]
6[object Object]
7

rioxarray bridges rasterio into xarray, giving CRS-aware indexing.

Zarr — cloud-native chunked arrays

For petabyte-scale data (climate reanalysis, MODIS cubes), Zarr stores chunks separately and allows parallel IO:

Python
1import xarray as xr
2ds = xr.open_zarr('s3://bucket/cube.zarr',
3                  storage_options={'anon': True})

Combined with Dask, this scales analysis across clusters.

Dask for large rasters

Python
1import xarray as xr
2[object Object]
3[object Object]
4

Dask handles larger-than-memory workloads by processing chunks in parallel.

Converting between CRSs with rioxarray

Python
import rioxarray

Memory considerations

  • Always use with rasterio.open(...) to close files.
  • Read in windows for huge data.
  • Cast to appropriate dtypes (float32 vs float64 halves memory).
  • Chunk with Dask / Xarray when data exceeds RAM.

Self-check exercises

1. When would you use Xarray + rioxarray instead of plain rasterio?

When you have multi-band, multi-time, or multi-dimensional data — satellite time series, climate reanalysis, stacks of images. Xarray gives you labelled dimensions (time, band, x, y) and operations that respect them. Plain rasterio is simpler for single-file 2D analysis; Xarray shines when you're manipulating datacubes.

2. Why cast to float32 before computing NDVI?

Sentinel-2 raw data is 16-bit unsigned integer. Integer arithmetic truncates: (nir - red) / (nir + red) in integer math produces 0 for most pixels because the magnitude comparisons lose precision. Casting to float32 gives real fractional values in [-1, 1]. float64 is overkill and doubles memory; float32 is the right compromise.

3. You want to process a 50 GB Sentinel-2 tile on a 16 GB laptop. Strategy?

Use Dask with Xarray: da = xr.open_dataarray(..., chunks={'x': 1024, 'y': 1024}). Operations become lazy; actual computation processes tiles one at a time, keeping RAM use low. Alternatively, open with rasterio windowed reads in a loop. For long-running jobs, process with cloud compute (Earth Engine, Planetary Computer) instead of the laptop.

Summary

  • Rasterio + NumPy is the core raster toolkit.
  • Windowed reads and reprojection are simple and fast.
  • Xarray and Dask handle multi-dimensional and cloud-scale data.
  • Zonal stats via rasterstats is the standard vector-raster bridge.

Further reading

  • Rasterio documentation (rasterio.readthedocs.io).
  • Xarray documentation (xarray.dev).
  • Dask documentation (dask.org).
  • Pangeo community — open science data cubes with Python.