6.1 Shapefiles — Strengths, Quirks, and Limits
The ubiquitous legacy format — how it works, why it persists, and why modern alternatives exist.
Key takeaways
- A shapefile is actually 3–7 files sharing a base name; all must travel together.
- It has hard limits (2 GB, 10-character field names, single geometry type) that cause real problems.
- Modern replacements (GeoPackage, GeoJSON, FlatGeobuf) solve every one — but shapefile still dominates the wild.
Introduction
ESRI released the shapefile format in 1993 as a lightweight interchange format. Thirty-plus years later it's the most widespread spatial format on Earth, despite well-known limitations. Every GIS professional needs to read, write, and debug shapefiles — and know when to replace them with something better.
Anatomy: a shapefile is a set of files
A single "shapefile" is actually a bundle sharing a base name:
| Extension | Contents | Required? |
|---|---|---|
.shp | Geometry | Yes |
.shx | Spatial index of geometry offsets | Yes |
.dbf | Attribute table (dBase III format) | Yes |
.prj | CRS in WKT | Highly recommended |
.cpg | Character encoding for .dbf | If using non-ASCII |
.qix / .sbn | Spatial index for performance | Optional |
.shp.xml | Metadata | Optional |
Send any subset and the shapefile breaks. Zip them together before emailing.
Geometry
One shapefile holds one geometry type — all points, all polylines, or all polygons. You can't mix points and lines in the same file. Types (ShapeType):
- 1 = Point
- 3 = Polyline
- 5 = Polygon
- 8 = MultiPoint
- 11–31 = variants with Z / M
Attributes
Stored in the .dbf (dBase III) table. Every row corresponds to a feature. Data types are limited:
- Numeric (precision + scale up to 18 digits).
- Character (up to 254 bytes).
- Date (
YYYYMMDD). - Logical (
T/F). - Memo (rarely used; not portable).
Date + time in one field? Not supported. Lists or JSON? Not supported.
The 10-character field name limit
Because .dbf predates modern databases, field names are limited to 10 ASCII characters. population_density becomes populati_1 when you write a shapefile. This silently truncates many fields — worse, different tools truncate differently, causing join failures.
The 2 GB size limit
Each file in the bundle is limited to 2 GB. At 2 GB, a polygon dataset holds roughly 70 million vertices. Most large raster ↔ vector conversions hit this ceiling.
Character encoding
Historical shapefiles have no encoding metadata. Without .cpg, text is assumed to be the local code page of the creator — producing mojibake when opened elsewhere. The .cpg file fixes this; ensure your writer produces one (modern GDAL does by default).
M and Z values
Shapefiles support optional:
- Z — elevation per vertex.
- M — "measure" — typically linear reference along a road.
These are stored in specific shape types (11 for PointZ, 21 for MultiPointZ, etc.).
Topology? No.
Shapefiles store every polygon as an independent set of rings. There's no shared edge, no topology table. Neighbouring polygons can drift apart (slivers) or overlap, and the format won't stop you.
Writing and reading in Python
1import geopandas as gpd
2[object Object]
3[object Object]
4[object Object]
5Always specify encoding on write to avoid mojibake.
Strengths
Despite the limitations, shapefiles remain dominant because:
- Universal support — every GIS tool reads them.
- Simple — no database server required.
- Fast for modest datasets — small files, quick reads.
- Zero-config — drop the folder in any tool and it works.
Replacements
| Replacement | Why consider |
|---|---|
GeoPackage (.gpkg) | Single file, SQLite-backed, supports multiple geometry types and 2 GB+, no name truncation. Recommended default. |
GeoJSON (.geojson) | Text, web-native, easy to diff in git. WGS84 only; slow for very large data. |
FlatGeobuf (.fgb) | Binary, streamable, cloud-native, random-access. Great for web consumption of large vector data. |
| Parquet + GeoParquet | Columnar, excellent compression, cloud-native, analytic workloads. |
The Switch From Shapefile campaign (switchfromshapefile.org) lays out the case succinctly. For new work, default to GeoPackage.
Migrating gracefully
Your organisation can't just delete every shapefile overnight. Practical migration:
- New work → write GeoPackage.
- Shared exchange → still use shapefiles if recipients insist, but also provide GeoPackage.
- Archives → convert when convenient; don't break consumers.
When you receive a zipped shapefile from someone else, try opening it in a browser GIS like Atlas before doing heavier analysis. A quick visual inspection catches missing .prj files, mojibake in attributes, unexpected geometry types, and empty layers early.
1# Convert shapefile to GeoPackage
2ogr2ogr -f GPKG parcels.gpkg parcels.shpSelf-check exercises
1. A colleague sends you `parcels.shp` and `parcels.dbf`. You open it and get no CRS. What's missing?
The .prj file. Without it the coordinates have no declared CRS. You need to either obtain the .prj (often kept alongside the shapefile) or deduce the CRS from context (coordinate ranges and the project's known region).
2. Why is "population_density" a problematic field name in a shapefile?
It exceeds the 10-character limit of dBase III field names. Different tools will truncate it differently (populati_1, populati_d, etc.), which breaks later joins and causes confusion. Shorter names or moving to a format without the limit (GeoPackage) fixes this.
3. What's the smallest realistic replacement for a shapefile in modern workflows?
GeoPackage (.gpkg). It's a single file, uses SQLite, supports multiple geometry types, rich data types, arbitrary field names, and scales past 2 GB. Every major GIS tool reads and writes it, making it a drop-in replacement for new work.
Summary
- Shapefile =
.shp+.shx+.dbf(+.prj+ others); all must travel together. - Hard limits: 2 GB, 10-character field names, one geometry type, no topology, encoding hell without
.cpg. - GeoPackage is the modern default replacement.
- Keep reading and writing shapefiles — they won't disappear this decade — but stop starting in shapefile.
Further reading
- ESRI — Shapefile Technical Description (1998, still authoritative).
- switchfromshapefile.org — rationale and alternatives.
- OGC — GeoPackage Encoding Standard (the authoritative spec).
- GDAL documentation — shapefile driver options and caveats.