CoursesGIS Basics — A Complete Introduction4.7 Lab — Reprojecting Data Correctly
Module 4: Coordinate Reference SystemsHands-on Lab

4.7 Lab — Reprojecting Data Correctly

A hands-on reprojection workflow covering three gotchas: datum shifts, axis order, and area calculations.

Lesson 21 of 100·40 min read

Key takeaways

  • Reprojecting is usually three lines of code, but correctness hinges on using the right tools and parameters.
  • Datum shifts require grid files for full accuracy; pyproj handles this automatically in recent versions.
  • Verify reprojection visually and by re-computing a known quantity.

Introduction

This hands-on lab has you reproject a real dataset through three common workflows: WGS 84 → UTM, WGS 84 → Web Mercator, and a legacy datum (NAD27) → modern (NAD83). You'll verify each reprojection visually and numerically, and debug one intentional bug.

Prerequisites

  • QGIS installed.
  • Python 3.10+ with geopandas, rasterio, pyproj, shapely installed (pip install geopandas rasterio pyproj shapely).
  • Optionally, GDAL CLI tools.
  • About 45 minutes.

Part 1 — Reproject a vector dataset (WGS84 → UTM)

Step 1. Download San Francisco city-boundaries GeoJSON from SF Open Data or any equivalent city-polygon file.

Step 2. In a Python notebook:

Python
import geopandas as gpd

Step 3. Reproject to UTM Zone 10N (EPSG:26910) and compute area again:

Python
1sf_utm = sf.to_crs('EPSG:26910')
2area_m2 = sf_utm.geometry.area.sum()
3print(f'{area_m2:,.0f} m² = {area_m2 / 1e6:.1f} km²')

You should see ~121 km² for the city of San Francisco.

Step 4. Save the UTM version:

Python
sf_utm.to_file('sf_utm.gpkg', driver='GPKG')

Part 2 — Reproject a raster (UTM → Web Mercator)

Step 5. Download a Sentinel-2 tile or any geotiff covering your area (see Lab 3.5 for sources).

Step 6. Reproject with gdalwarp:

Shell
1gdalwarp -t_srs EPSG:3857 \
2         -tr 10 10 \
3         -r bilinear \
4         s2_band4.tif s2_band4_webmercator.tif

Flag meanings:

  • -t_srs target CRS.
  • -tr target resolution (metres in the target CRS).
  • -r bilinear resampling method (use near for categorical data).

Step 7. Verify with:

Shell
gdalinfo s2_band4_webmercator.tif

The CRS should now be EPSG:3857 and pixel size ~10 m.

Part 3 — Datum shift (NAD27 → NAD83)

Step 8. Find any NAD27 dataset (legacy US survey data often uses it). Sample: USGS HUC-12 watersheds in NAD27.

Step 9. Reproject:

Python
1nad27 = gpd.read_file('huc250k_nad27.shp')
2nad83 = nad27.to_crs('EPSG:4269')
3# Compute mean coordinate shift
4diffs = (nad83.geometry.centroid.x - nad27.geometry.centroid.x).describe()
5print(diffs)   # typically a few seconds of longitude, ≈10-50 m

Behind the scenes, pyproj applies the NADCON grid shift. If your output looks identical to input, you might be missing grid-shift files — check with projinfo --area "Conterminous United States" EPSG:4267+4269.

Part 4 — Intentional bug

Step 10. Take a vector dataset and do this:

Python
1sf = gpd.read_file('sf_boundary.geojson')   # EPSG:4326
2sf.crs = 'EPSG:3857'                        # WRONG — relabels, doesn't reproject
3sf.to_file('sf_bug.geojson')

Step 11. Open the broken file in QGIS on top of an OSM base layer. You should see it plotted near (−180 m, −122 m) — essentially at the prime meridian near the equator — instead of California. This is because we relabeled degrees as metres.

Step 12. Fix it:

Python
1sf = gpd.read_file('sf_bug.geojson')
2# Despite the CRS tag, we know these are degrees
3sf.crs = 'EPSG:4326'                        # correct the label
4sf_webmerc = sf.to_crs('EPSG:3857')         # now actually reproject
5sf_webmerc.to_file('sf_fixed.geojson')

Confirm it now plots in the right place.

Part 5 — Reflection

Write in your notes:

  • What was the area of San Francisco in m², km²?
  • By roughly how many metres did the NAD27 → NAD83 datum shift move centroids in your dataset?
  • What error did the intentional bug produce visually, and why?

Troubleshooting

  • pyproj can't find grid shift files — run pyproj sync --area-of-use "USA" to download. Or upgrade to PROJ 8+.
  • Area is negative — your polygon ring is oriented the wrong way; reverse with .reverse() or make_valid.
  • Web Mercator output looks stretched near poles — expected; Web Mercator distorts at high latitude. For high-lat analyses, use Polar Stereographic (EPSG:3031 / 3995).
  • gdalwarp output has black borders — the source and target extents don't align; pass -te to control output extent.

Self-check exercises

1. Why does computing `sf.geometry.area.sum()` in EPSG:4326 give a meaningless number?

EPSG:4326 is angular (degrees). The area method computes planar area on the coordinate values, which in this case produces square degrees — a meaningless unit where 1 sq. degree represents very different physical areas at different latitudes. Reproject to a projected metric CRS or use a geodesic area function.

2. You run `gdalwarp` to reproject a classified raster. Which resampling method should you use?

near (nearest neighbour). Bilinear or cubic resampling averages neighbouring cells, which would invent new class values that don't exist in the legend. For continuous data (elevation, temperature, reflectance), bilinear or cubic is fine; for categorical data, always near.

3. What's the difference between setting `gdf.crs = 'EPSG:3857'` and `gdf.to_crs('EPSG:3857')`?

Setting gdf.crs only changes the metadata label — the underlying coordinates don't move. to_crs applies the mathematical transformation, so the coordinates end up in the new CRS. Using the former when you mean the latter is one of the most common CRS bugs in GIS.

Summary

  • to_crs reprojects; assigning crs just relabels. Use the right one.
  • gdalwarp with -t_srs reprojects rasters; choose resampling method per data type.
  • Datum shifts (NAD27 → NAD83, OSGB36 → WGS84) require grid shift files for sub-metre accuracy.
  • Always verify reprojection visually and by re-computing a known quantity.

Further reading

  • pyproj documentation — Transformer and Geod classes.
  • GDAL gdalwarp documentation.
  • NGS — NADCON and GEOCON grid shift resources.
  • PROJ grid-shift data repository.
Module test

Module 4: Coordinate Reference Systems

Answer these quick multiple-choice questions to check your understanding before moving on.

1. What does a coordinate reference system define?
2. Why are projected coordinate systems useful?
3. What is an EPSG code used for?