CoursesGIS Basics — A Complete Introduction6.5 Raster Formats — GeoTIFF and Cloud-Optimised GeoTIFF
Module 6: Spatial Data Formats

6.5 Raster Formats — GeoTIFF and Cloud-Optimised GeoTIFF

The raster format that runs the world, and its cloud-native variant that streams from object storage.

Lesson 31 of 100·20 min read

Key takeaways

  • GeoTIFF is a georeferenced variant of TIFF — the universal raster container.
  • Compression, tiling, and overviews are orthogonal choices that massively affect performance.
  • A Cloud-Optimised GeoTIFF (COG) rearranges the file so clients can stream only the bytes they need.

Introduction

If GeoJSON is the web's default vector format, GeoTIFF is the universal raster format. The cloud-optimised variant (COG) makes GeoTIFFs stream efficiently from S3-style object storage, powering modern platforms like Google Earth Engine, AWS Open Data, and Microsoft Planetary Computer.

TIFF and GeoTIFF

TIFF (Tagged Image File Format, 1986) is a container for raster data with flexible tags — a dictionary of metadata per image. GeoTIFF (1995) specifies which tags carry georeferencing: the CRS, the affine transform, and optional nodata.

A GeoTIFF is still a valid TIFF; a non-spatial TIFF viewer ignores the geo tags. This backward-compatibility is why GeoTIFF won.

Inspecting a GeoTIFF

Shell
gdalinfo -stats scene.tif

Returns:

  • Driver (should say "GTiff").
  • Size (width × height).
  • Band count and data type.
  • CRS (in WKT).
  • Affine transform / pixel size.
  • Nodata value.
  • Statistics per band.

Compression

GeoTIFF supports many compression algorithms:

CompressionLossy?Best for
NoneFastest reads; largest files.
LZWNoGeneral-purpose lossless.
DEFLATENoSlightly better compression than LZW, slightly slower.
ZSTDNoModern, fast lossless. Requires GDAL 3.4+.
JPEGYes8-bit natural-colour imagery.
WEBPYes or NoWeb-optimised; small files.

Choose based on bit depth and use case:

  • 16-bit scientific data (Sentinel-2, Landsat surface reflectance) → ZSTD or DEFLATE.
  • 8-bit natural colour orthophotos → JPEG at 75–90 % quality.
  • Classified 8-bit land cover → LZW (keep exact class values).

Tiling vs stripes

Two internal layouts:

  • Stripes — rows or groups of rows stored contiguously. Default; inefficient for windowed reads.
  • Tiles — 256×256 or 512×512 square blocks stored independently. Enables fast region-of-interest reads.

Tiled is the right default for any analysis. Strided is fine for purely sequential reads (e.g., display).

Overviews (pyramids)

Large rasters benefit from pre-computed downsampled versions — overviews — that let viewers display low-zoom quickly without reading the full resolution. GDAL creates them:

Shell
gdaladdo -r average scene.tif 2 4 8 16 32

This creates five overview levels (2×, 4×, 8×, 16×, 32× downsampled). The numbers are multiplicative factors: each overview has half the linear resolution of the previous.

External overviews live in a sidecar .ovr file; internal overviews (recommended for COG) live inside the TIFF.

Cloud-Optimised GeoTIFF (COG)

A COG is a regular GeoTIFF with a specific internal layout:

  1. Header first — CRS, transform, IFDs at the start of the file.
  2. Internal tiles — not stripes.
  3. Internal overviews — not sidecar .ovr.
  4. Tiles + overviews ordered bottom-up in the file so a client can read low-resolution first, then zoom in.

Crucially, a COG is still a valid GeoTIFF — every existing tool reads it without modification. But a cloud-aware client can fetch only the byte ranges it needs over HTTP range requests, skipping unneeded tiles entirely.

Create a COG:

Shell
1gdal_translate scene.tif scene_cog.tif \
2  -of COG \
3  -co COMPRESS=ZSTD \
4  -co PREDICTOR=2 \
5  -co BIGTIFF=IF_SAFER

Validate:

Shell
python -m validate_cloud_optimized_geotiff scene_cog.tif

Reading COGs from the cloud

Python
1import rasterio
2from rasterio.windows import Window

Rasterio and GDAL transparently use HTTP range requests on remote URLs.

Other raster formats you'll meet

  • JP2 / JPEG 2000 — wavelet compression, used by historical Sentinel-2 distribution. Patent status now free.
  • NetCDF / HDF5 — multi-dimensional scientific formats (time + variables); common in climate / oceans.
  • Zarr — cloud-native chunked N-dim arrays; the Python / Xarray ecosystem favourite.
  • ERDAS IMG (.img) — ESRI-friendly raster; mostly legacy.
  • BIL / BIP / BSQ — band-interleaved formats for multispectral; legacy.

A practical workflow

  1. Receive data in the format provider gives you (often uncompressed TIFF or JP2).
  2. Inspect with gdalinfo — record CRS, resolution, data type, nodata.
  3. Reproject if needed with gdalwarp.
  4. Convert to COG for cloud serving or analysis pipelines.
  5. Document in a sidecar metadata file.

Self-check exercises

1. What's the difference between a GeoTIFF and a Cloud-Optimised GeoTIFF?

Structurally, a COG is a GeoTIFF with: internal tiles (not strides), internal overviews, and IFDs arranged so clients can read the header plus only the specific tiles they need via HTTP range requests. A COG is still a valid TIFF; legacy tools read it unchanged, but cloud-aware tools can stream efficiently.

2. Your classified land-cover GeoTIFF has 7 classes. Should you use JPEG compression?

No. JPEG is lossy and would alter class values slightly, turning "urban" cells into "cropland" at boundaries. Use LZW or DEFLATE (lossless) to preserve exact values. For 8-bit natural-colour imagery where small colour shifts are acceptable, JPEG is fine.

3. You're serving a 10 GB satellite mosaic from S3. What three things should you verify before clients start requesting it?

(1) It's a valid COG (internal tiles, internal overviews, correct IFD order — validate with validate_cloud_optimized_geotiff). (2) Bucket CORS allows range requests from your clients' origins. (3) Public read permissions are set (or proper signed URLs). With these in place, clients can stream only the tiles they need.

Summary

  • GeoTIFF is the universal raster format; COG is its cloud-native layout.
  • Compression, tiling, and overviews are orthogonal choices with big performance implications.
  • COGs enable bytes-on-demand streaming — the foundation of modern raster platforms.

Further reading

  • OGC — Cloud-Optimized GeoTIFF standard.
  • cogeo.org — curated COG resources.
  • gdal_translate -of COG documentation.
  • Planet and AWS open data documentation on COG publishing.