CoursesGIS Basics — A Complete Introduction6.1 Shapefiles — Strengths, Quirks, and Limits
Module 6: Spatial Data Formats

6.1 Shapefiles — Strengths, Quirks, and Limits

The ubiquitous legacy format — how it works, why it persists, and why modern alternatives exist.

Lesson 27 of 100·18 min read

Key takeaways

  • A shapefile is actually 3–7 files sharing a base name; all must travel together.
  • It has hard limits (2 GB, 10-character field names, single geometry type) that cause real problems.
  • Modern replacements (GeoPackage, GeoJSON, FlatGeobuf) solve every one — but shapefile still dominates the wild.

Introduction

ESRI released the shapefile format in 1993 as a lightweight interchange format. Thirty-plus years later it's the most widespread spatial format on Earth, despite well-known limitations. Every GIS professional needs to read, write, and debug shapefiles — and know when to replace them with something better.

Anatomy: a shapefile is a set of files

A single "shapefile" is actually a bundle sharing a base name:

ExtensionContentsRequired?
.shpGeometryYes
.shxSpatial index of geometry offsetsYes
.dbfAttribute table (dBase III format)Yes
.prjCRS in WKTHighly recommended
.cpgCharacter encoding for .dbfIf using non-ASCII
.qix / .sbnSpatial index for performanceOptional
.shp.xmlMetadataOptional

Send any subset and the shapefile breaks. Zip them together before emailing.

Geometry

One shapefile holds one geometry type — all points, all polylines, or all polygons. You can't mix points and lines in the same file. Types (ShapeType):

  • 1 = Point
  • 3 = Polyline
  • 5 = Polygon
  • 8 = MultiPoint
  • 11–31 = variants with Z / M

Attributes

Stored in the .dbf (dBase III) table. Every row corresponds to a feature. Data types are limited:

  • Numeric (precision + scale up to 18 digits).
  • Character (up to 254 bytes).
  • Date (YYYYMMDD).
  • Logical (T/F).
  • Memo (rarely used; not portable).

Date + time in one field? Not supported. Lists or JSON? Not supported.

The 10-character field name limit

Because .dbf predates modern databases, field names are limited to 10 ASCII characters. population_density becomes populati_1 when you write a shapefile. This silently truncates many fields — worse, different tools truncate differently, causing join failures.

The 2 GB size limit

Each file in the bundle is limited to 2 GB. At 2 GB, a polygon dataset holds roughly 70 million vertices. Most large raster ↔ vector conversions hit this ceiling.

Character encoding

Historical shapefiles have no encoding metadata. Without .cpg, text is assumed to be the local code page of the creator — producing mojibake when opened elsewhere. The .cpg file fixes this; ensure your writer produces one (modern GDAL does by default).

M and Z values

Shapefiles support optional:

  • Z — elevation per vertex.
  • M — "measure" — typically linear reference along a road.

These are stored in specific shape types (11 for PointZ, 21 for MultiPointZ, etc.).

Topology? No.

Shapefiles store every polygon as an independent set of rings. There's no shared edge, no topology table. Neighbouring polygons can drift apart (slivers) or overlap, and the format won't stop you.

Writing and reading in Python

Python
1import geopandas as gpd
2[object Object]
3[object Object]
4[object Object]
5

Always specify encoding on write to avoid mojibake.

Strengths

Despite the limitations, shapefiles remain dominant because:

  • Universal support — every GIS tool reads them.
  • Simple — no database server required.
  • Fast for modest datasets — small files, quick reads.
  • Zero-config — drop the folder in any tool and it works.

Replacements

ReplacementWhy consider
GeoPackage (.gpkg)Single file, SQLite-backed, supports multiple geometry types and 2 GB+, no name truncation. Recommended default.
GeoJSON (.geojson)Text, web-native, easy to diff in git. WGS84 only; slow for very large data.
FlatGeobuf (.fgb)Binary, streamable, cloud-native, random-access. Great for web consumption of large vector data.
Parquet + GeoParquetColumnar, excellent compression, cloud-native, analytic workloads.

The Switch From Shapefile campaign (switchfromshapefile.org) lays out the case succinctly. For new work, default to GeoPackage.

Migrating gracefully

Your organisation can't just delete every shapefile overnight. Practical migration:

  • New work → write GeoPackage.
  • Shared exchange → still use shapefiles if recipients insist, but also provide GeoPackage.
  • Archives → convert when convenient; don't break consumers.

When you receive a zipped shapefile from someone else, try opening it in a browser GIS like Atlas before doing heavier analysis. A quick visual inspection catches missing .prj files, mojibake in attributes, unexpected geometry types, and empty layers early.

Shell
1# Convert shapefile to GeoPackage
2ogr2ogr -f GPKG parcels.gpkg parcels.shp

Self-check exercises

1. A colleague sends you `parcels.shp` and `parcels.dbf`. You open it and get no CRS. What's missing?

The .prj file. Without it the coordinates have no declared CRS. You need to either obtain the .prj (often kept alongside the shapefile) or deduce the CRS from context (coordinate ranges and the project's known region).

2. Why is "population_density" a problematic field name in a shapefile?

It exceeds the 10-character limit of dBase III field names. Different tools will truncate it differently (populati_1, populati_d, etc.), which breaks later joins and causes confusion. Shorter names or moving to a format without the limit (GeoPackage) fixes this.

3. What's the smallest realistic replacement for a shapefile in modern workflows?

GeoPackage (.gpkg). It's a single file, uses SQLite, supports multiple geometry types, rich data types, arbitrary field names, and scales past 2 GB. Every major GIS tool reads and writes it, making it a drop-in replacement for new work.

Summary

  • Shapefile = .shp + .shx + .dbf (+ .prj + others); all must travel together.
  • Hard limits: 2 GB, 10-character field names, one geometry type, no topology, encoding hell without .cpg.
  • GeoPackage is the modern default replacement.
  • Keep reading and writing shapefiles — they won't disappear this decade — but stop starting in shapefile.

Further reading

  • ESRI — Shapefile Technical Description (1998, still authoritative).
  • switchfromshapefile.org — rationale and alternatives.
  • OGC — GeoPackage Encoding Standard (the authoritative spec).
  • GDAL documentation — shapefile driver options and caveats.