CoursesGIS Basics — A Complete Introduction3.5 Lab — Identifying Data Models in the Wild
Module 3: Spatial Data ModelsHands-on Lab

3.5 Lab — Identifying Data Models in the Wild

Download three real-world datasets, identify their data model, and inspect their properties with free tools.

Lesson 14 of 100·35 min read

Key takeaways

  • You can tell a lot about a dataset by inspecting its extension and metadata before loading it.
  • Free tools (QGIS, gdalinfo, ogrinfo) give you every piece of information you need.
  • The same phenomenon is often available as both vector and raster — compare them side by side.

Introduction

This hands-on lab gets you comfortable identifying vector, raster, and other spatial data models from real sources. You'll download three public datasets, inspect their structure, and record observations that will recur across the rest of the course.

Prerequisites

  • QGIS 3.x installed (free, open source).
  • Optionally, GDAL command-line tools (gdalinfo, ogrinfo) — they come bundled with QGIS. Open the OSGeo4W Shell (Windows) or a normal terminal (macOS / Linux).
  • About 45 minutes.

Dataset 1 — A vector dataset (Natural Earth Countries)

Step 1. Go to naturalearthdata.com and download ne_110m_admin_0_countries.zip. Unzip it — you should see .shp, .shx, .dbf, .prj, and .cpg files.

Step 2. In a terminal, run:

Shell
ogrinfo -so -al ne_110m_admin_0_countries.shp

-so means "summary only"; -al means "all layers". Read the output carefully.

Step 3. Record answers to these questions in a notes file:

  • How many features does the layer contain?
  • What is the geometry type?
  • What is the CRS (look for PROJCS or GEOGCS)?
  • How many fields (columns)? Name 5.
  • What is the bounding box?

Step 4. In QGIS, Layer → Add Layer → Vector and load the same file. Right-click → Open Attribute Table to see the tabular side.

Dataset 2 — A raster dataset (SRTM elevation)

Step 5. Download a 1 arc-second (~30 m) SRTM tile from USGS EarthExplorer or use the cached sample from OpenTopography. Alternatively, go to dwtkns.com/srtm30m/ and click a tile over a mountainous region (e.g., the Alps).

Step 6. Run:

Shell
gdalinfo N46E009.hgt

(Substitute your actual file name.)

Step 7. Record:

  • Driver (HGT? GeoTIFF?).
  • Size in pixels (width × height).
  • Band count and data type.
  • Pixel size (cell resolution).
  • Origin (upper-left corner coordinates).
  • CRS.
  • Minimum / maximum elevation values.

Step 8. In QGIS, load the raster and apply a hillshade (Raster → Analysis → Hillshade) to visualise the terrain.

Dataset 3 — A multiband raster (Sentinel-2)

Step 9. Use the free Copernicus Browser or the Google Earth Engine Data Catalog to find a Sentinel-2 scene over a vegetated area. Download band 4 (red) and band 8 (NIR) as GeoTIFFs.

Step 10. Run:

Shell
1gdalinfo -stats B04.tif
2gdalinfo -stats B08.tif

Step 11. Record:

  • Resolution (should be 10 m for both).
  • Bit depth (UInt16).
  • Nodata value (often 0 for S2 at edges).
  • CRS (typically a UTM zone).
  • Min / max / mean reflectance values.

Step 12 (bonus). Compute NDVI in QGIS — Raster → Raster Calculator — using ("B08@1" - "B04@1") / ("B08@1" + "B04@1"). Visualise with a green-red ramp. You should see healthy vegetation as high NDVI (dark green) and bare ground / water as low NDVI.

Part 4 — Compare the same phenomenon in both models

Step 13. Download the OpenStreetMap building footprints for a small area (using Overpass Turbo or Geofabrik extracts) and compare them to a satellite image of the same area.

  • Zoom in QGIS until you can see individual buildings.
  • How do OSM polygons compare to the actual footprints visible in the imagery?
  • What attributes do OSM buildings carry (height, levels, usage)?
  • If you wanted to compute total building rooftop area across a city, would you use the vectors or rasterise them first?

Record your answer with reasoning.

Part 5 — Troubleshooting checklist

  • gdalinfo: command not found — add GDAL to your PATH, or open the OSGeo4W Shell, or use Docker (docker run -it osgeo/gdal).
  • Raster won't display — check the stretch; a 16-bit Sentinel-2 file often looks black with default styling. Apply a contrast-enhancement stretch.
  • CRS mismatch warnings — learn to love them; they're usually real. See Module 4.
  • Shapefile missing parts — you need all of .shp, .shx, .dbf, and (for CRS) .prj. Losing .prj means losing the CRS.

Self-check exercises

1. Which tool or command would you use first to identify a mystery file's data model?

gdalinfo for rasters and ogrinfo for vectors. Both report the driver (implying format), geometry or band structure, CRS, extent, and basic statistics without requiring you to load the file in a GUI. Together they handle almost every geospatial format you'll meet.

2. What's the difference in downstream options between a building represented as a polygon and one represented as a group of raster cells?

The polygon gives you exact geometry, easy area / perimeter queries, topological operations, and rich attributes per building. The raster gives you fast neighbourhood operations (density, kernel-based analyses) and easy combination with other rasters (e.g., solar potential per cell), at the cost of stair-stepped boundaries and lost per-feature attributes.

3. Why is a Sentinel-2 scene typically stored as UInt16 instead of Float32?

The sensor reports reflectance values on a bounded integer scale (0–10,000 for surface reflectance). UInt16 covers the range exactly with half the storage and memory cost of Float32. Subsequent analyses may cast to float for calculations like NDVI, but storing raw data as integer is efficient.

Summary

  • ogrinfo identifies vector; gdalinfo identifies raster.
  • CRS, extent, resolution, and data type are the first four properties to record for any dataset.
  • Matching the right model to the phenomenon makes every downstream analysis easier.
  • The same phenomenon (buildings, elevation) is often available in both models — pick based on use.

Further reading

  • GDAL documentation — gdalinfo, ogrinfo, gdalwarp references.
  • Natural Earth — free global vector datasets at three scales.
  • USGS EarthExplorer — authoritative source for Landsat, SRTM, and more.
  • Copernicus Data Space — Sentinel products.
Module test

Module 3: Spatial Data Models

Answer these quick multiple-choice questions to check your understanding before moving on.

1. Which data model represents discrete features like roads and parcels?
2. Which data model is usually best for continuous surfaces like elevation?
3. A TIN is especially useful for representing what?