4.7 Lab — Reprojecting Data Correctly
A hands-on reprojection workflow covering three gotchas: datum shifts, axis order, and area calculations.
Key takeaways
- Reprojecting is usually three lines of code, but correctness hinges on using the right tools and parameters.
- Datum shifts require grid files for full accuracy;
pyprojhandles this automatically in recent versions.- Verify reprojection visually and by re-computing a known quantity.
Introduction
This hands-on lab has you reproject a real dataset through three common workflows: WGS 84 → UTM, WGS 84 → Web Mercator, and a legacy datum (NAD27) → modern (NAD83). You'll verify each reprojection visually and numerically, and debug one intentional bug.
Prerequisites
- QGIS installed.
- Python 3.10+ with
geopandas,rasterio,pyproj,shapelyinstalled (pip install geopandas rasterio pyproj shapely). - Optionally, GDAL CLI tools.
- About 45 minutes.
Part 1 — Reproject a vector dataset (WGS84 → UTM)
Step 1. Download San Francisco city-boundaries GeoJSON from SF Open Data or any equivalent city-polygon file.
Step 2. In a Python notebook:
import geopandas as gpdStep 3. Reproject to UTM Zone 10N (EPSG:26910) and compute area again:
1sf_utm = sf.to_crs('EPSG:26910')
2area_m2 = sf_utm.geometry.area.sum()
3print(f'{area_m2:,.0f} m² = {area_m2 / 1e6:.1f} km²')You should see ~121 km² for the city of San Francisco.
Step 4. Save the UTM version:
sf_utm.to_file('sf_utm.gpkg', driver='GPKG')Part 2 — Reproject a raster (UTM → Web Mercator)
Step 5. Download a Sentinel-2 tile or any geotiff covering your area (see Lab 3.5 for sources).
Step 6. Reproject with gdalwarp:
1gdalwarp -t_srs EPSG:3857 \
2 -tr 10 10 \
3 -r bilinear \
4 s2_band4.tif s2_band4_webmercator.tifFlag meanings:
-t_srstarget CRS.-trtarget resolution (metres in the target CRS).-r bilinearresampling method (usenearfor categorical data).
Step 7. Verify with:
gdalinfo s2_band4_webmercator.tifThe CRS should now be EPSG:3857 and pixel size ~10 m.
Part 3 — Datum shift (NAD27 → NAD83)
Step 8. Find any NAD27 dataset (legacy US survey data often uses it). Sample: USGS HUC-12 watersheds in NAD27.
Step 9. Reproject:
1nad27 = gpd.read_file('huc250k_nad27.shp')
2nad83 = nad27.to_crs('EPSG:4269')
3# Compute mean coordinate shift
4diffs = (nad83.geometry.centroid.x - nad27.geometry.centroid.x).describe()
5print(diffs) # typically a few seconds of longitude, ≈10-50 mBehind the scenes, pyproj applies the NADCON grid shift. If your output looks identical to input, you might be missing grid-shift files — check with projinfo --area "Conterminous United States" EPSG:4267+4269.
Part 4 — Intentional bug
Step 10. Take a vector dataset and do this:
1sf = gpd.read_file('sf_boundary.geojson') # EPSG:4326
2sf.crs = 'EPSG:3857' # WRONG — relabels, doesn't reproject
3sf.to_file('sf_bug.geojson')Step 11. Open the broken file in QGIS on top of an OSM base layer. You should see it plotted near (−180 m, −122 m) — essentially at the prime meridian near the equator — instead of California. This is because we relabeled degrees as metres.
Step 12. Fix it:
1sf = gpd.read_file('sf_bug.geojson')
2# Despite the CRS tag, we know these are degrees
3sf.crs = 'EPSG:4326' # correct the label
4sf_webmerc = sf.to_crs('EPSG:3857') # now actually reproject
5sf_webmerc.to_file('sf_fixed.geojson')Confirm it now plots in the right place.
Part 5 — Reflection
Write in your notes:
- What was the area of San Francisco in m², km²?
- By roughly how many metres did the NAD27 → NAD83 datum shift move centroids in your dataset?
- What error did the intentional bug produce visually, and why?
Troubleshooting
pyprojcan't find grid shift files — runpyproj sync --area-of-use "USA"to download. Or upgrade to PROJ 8+.- Area is negative — your polygon ring is oriented the wrong way; reverse with
.reverse()ormake_valid. - Web Mercator output looks stretched near poles — expected; Web Mercator distorts at high latitude. For high-lat analyses, use Polar Stereographic (EPSG:3031 / 3995).
gdalwarpoutput has black borders — the source and target extents don't align; pass-teto control output extent.
Self-check exercises
1. Why does computing `sf.geometry.area.sum()` in EPSG:4326 give a meaningless number?
EPSG:4326 is angular (degrees). The area method computes planar area on the coordinate values, which in this case produces square degrees — a meaningless unit where 1 sq. degree represents very different physical areas at different latitudes. Reproject to a projected metric CRS or use a geodesic area function.
2. You run `gdalwarp` to reproject a classified raster. Which resampling method should you use?
near (nearest neighbour). Bilinear or cubic resampling averages neighbouring cells, which would invent new class values that don't exist in the legend. For continuous data (elevation, temperature, reflectance), bilinear or cubic is fine; for categorical data, always near.
3. What's the difference between setting `gdf.crs = 'EPSG:3857'` and `gdf.to_crs('EPSG:3857')`?
Setting gdf.crs only changes the metadata label — the underlying coordinates don't move. to_crs applies the mathematical transformation, so the coordinates end up in the new CRS. Using the former when you mean the latter is one of the most common CRS bugs in GIS.
Summary
to_crsreprojects; assigningcrsjust relabels. Use the right one.gdalwarpwith-t_srsreprojects rasters; choose resampling method per data type.- Datum shifts (NAD27 → NAD83, OSGB36 → WGS84) require grid shift files for sub-metre accuracy.
- Always verify reprojection visually and by re-computing a known quantity.
Further reading
- pyproj documentation —
TransformerandGeodclasses. - GDAL
gdalwarpdocumentation. - NGS — NADCON and GEOCON grid shift resources.
- PROJ grid-shift data repository.
Module 4: Coordinate Reference Systems
Answer these quick multiple-choice questions to check your understanding before moving on.