Relations in GIS

When working with geographic data, relationships between datasets are often as important as the data itself. Imagine tracking tree inspections across a city, monitoring measurements across different acres of farmland, or analyzing water flow in reservoir systems. These scenarios require robust data relationships to provide meaningful insights. At Atlas, we've tackled this challenge head-on, implementing a powerful relations system that connects datasets while maintaining the performance our users expect.

The Challenge of Relating Geographic Data

Geographic data rarely exists in isolation. A tree in a park might have multiple inspections over time. An acre of farmland could have dozens of soil measurements. A water reservoir has inflow and outflow measurements that affect its levels. These connections between primary entities (trees, acres, reservoirs) and their related records (inspections, measurements, flow rates) form the backbone of advanced GIS analysis.

Some key use cases for relations include:

Connecting tree inspections to specific trees
Linking soil measurements to particular acres of land
Tracking water influx and outflow rates for reservoirs
Calculating aggregated values like average measurements per feature
Determining time-based insights like "days since last inspection"

While these relationships sound straightforward conceptually, implementing them in a high-performance GIS system presents significant technical challenges.

Water Reservoir Management: A Perfect Use Case

Water reservoir management presents an ideal scenario for demonstrating the power of relations in GIS. Water systems rely on understanding the balance between influx (water entering the system) and outflow (water leaving the system). This balance directly affects reservoir levels, water quality, and system stability.

With relations, water managers can:

Connect influx measurement points to specific reservoirs
Track outflow rates at multiple discharge points
Calculate net water balance for each reservoir
Monitor seasonal variations in water flow patterns
Identify potential leaks or unauthorized usage

By connecting measurement data to reservoirs through relations, water managers can create a comprehensive view of the entire water system, enabling better management decisions and more efficient resource allocation.

Exploring Implementation Options

Before diving into development, we explored four potential approaches to implementing relations:

Option 1: Propagate Updates

This approach involves duplicating related data across datasets. When an influx measurement is updated, the changes are automatically propagated to the related reservoir record.

Pros:

Fast read operations with no joins required
Simple querying and filtering on related fields
Straightforward implementation of lookup fields

Cons:

Complex update logic to propagate changes
Difficult to track which datasets contain lookup fields
Performance issues when updating records with many relationships

Option 2: Store Relation IDs

This method stores only the ID of related records, resolving the actual data at query time.

Pros:

Simple update operations
Always provides the most current data
Cleaner data model with less duplication

Cons:

Slower read operations requiring joins
Higher database load
More complex implementation for filtering and sorting

Option 3: Dedicated Relations Table

Creating a separate table to track relationships between datasets.

Pros:

Flexible and powerful for complex relationships
Excellent for rollup functions and aggregations
Clean separation of concerns

Cons:

Additional table to maintain
Increased database complexity
Potential performance impact on reads

Option 4: Hybrid Approach

Combining a relations table with strategic data duplication.

Pros:

Balances read and write performance
Supports complex queries and relationships
Maintains data currency while optimizing common operations

Cons:

Most complex implementation
Requires careful management of duplicated data
Higher maintenance overhead

Our Implementation Journey

We initially gravitated toward Option 3 (Dedicated Relations Table) because it offered the cleanest data model with minimal write overhead. The implementation looked promising:

class Relation:
    id                  *# Unique identifier for the relation*
    source_dataset_id   *# ID of the source dataset*
    source_feature_id   *# ID of the source feature*
    target_dataset_id   *# ID of the target dataset*
    target_feature_id   *# ID of the target feature*

This model allowed us to track relationships between any two features across different datasets. We could then resolve these relationships at query time, and ensure that the data we displayed to the user was the newest. However, as we tested this approach with real-world data volumes, we encountered performance issues. The additional joins required during tile generation added latency that created a noticeable lag in the user experience.

The Performance Imperative

For a mapping application like Atlas, read performance is critical. When users pan and zoom around a map, they expect tiles to load instantly. Even small delays compound to create a sluggish experience.

Our application streams vector tiles directly from PostGIS, which means any additional processing during tile generation directly impacts the user experience. After extensive testing, we determined that Option 3, while elegant, couldn't meet our performance requirements.

Our Final Solution: The Hybrid Approach

We ultimately implemented Option 4, a hybrid approach that combines a relations table with strategic data duplication. This solution:

Maintains a dedicated relations table for tracking connections between datasets and features
Pre-calculates and stores lookup values
Uses the relations table for complex aggregations and less common queries
Implements efficient update propagation for changed records

The implementation required more complex code, particularly for managing updates, as we now had to propagate the updates to all the related features when doing such a simple operation as changing a property in the data table. However, these updates can be done async (in the background), and when the update is finished we can notify the frontend with the new values.

This asynchronous update pattern became crucial for maintaining performance while ensuring data consistency. When a user updates a property in a water reservoir record, for example, we need to recalculate all the related lookup fields across potentially dozens of related features. Doing this synchronously would create unacceptable delays in the user interface.

Our solution implements a background worker system that:

Captures relation update events in a queue
Processes updates in batches for efficiency
Intelligently prioritizes updates based on visibility and importance
Notifies the frontend when updates are complete

This approach allows users to continue working without waiting for all related updates to complete. When updates finish, the system notifies the frontend through WebSocket connections, enabling real-time updates without page refreshes.

Auto-Matching Relations: Simplifying Data Migration

We understand that not all users create datasets from the ground up in Atlas. Many organizations already have extensive datasets with complex relationships established in other systems. For these customers, migrating to a new platform while preserving existing relationships can be a significant challenge.

To address this, we've developed an auto-matching feature for relations, ensuring that users can migrate all their data to Atlas while maintaining existing relationships between datasets. This feature automatically identifies potential relationships between datasets based on matching values in specified columns.

Here's how our auto-matching system works:

def batch_create_relations(source_dataset_id, target_dataset_id, source_column_id, target_column_id):
    *# Get data from source dataset*
    source_data = get_dataset_rows(source_dataset_id)
*# Get data from target dataset*
target_data = get_dataset_rows(target_dataset_id)

*# Extract column values for matching*
source_values = extract_column_values(source_data, source_column_id)
target_values = extract_column_values(target_data, target_column_id)

*# Find matches between datasets*
matches = find_matching_values(source_values, target_values)

*# Create relations for each match*
relations_created = 0
for each match in matches:
    create_relation(
        source_dataset_id = source_dataset_id,
        source_feature_id = match.source_id,
        target_dataset_id = target_dataset_id,
        target_feature_id = match.target_id
    )
    relations_created += 1

return relations_created

This auto-matching system handles various data formats, including:

Simple text or numeric matches
Multi-value fields (arrays or delimited strings)
Case-insensitive matching
Partial matches based on configurable thresholds

For example, a water utility company migrating to Atlas could automatically connect their reservoir dataset with their flow measurement dataset by matching reservoir IDs, preserving the critical relationships needed for water management analysis.

The Results: Performance Where It Matters

The hybrid approach delivered the performance we needed for our mapping application. Tile generation remained fast, with lookup fields available without expensive joins. At the same time, we maintained the flexibility to perform complex aggregations and rollups when needed.

Key benefits of our implementation include:

Vector tiles load quickly with pre-calculated relationship data
Updates propagate efficiently to maintain data currency
Complex queries like "average water influx per reservoir" perform well
The system scales effectively with increasing data volumes

Visualizing Relations in the Frontend

The implementation of relations in our GIS system wouldn't be complete without an intuitive way to visualize these connections. At Atlas, we've developed a powerful visualization system that helps users instantly understand the relationships between geographic features.

When a user selects a feature in the map that has relations, our system automatically visualizes these connections using arc layers and secondary colors. These curved lines connect the source feature to all its related features, creating an immediate visual understanding of the relationships.

The image provided showcases an example of how relations can be visualized effectively. It displays an organization's operational reach across multiple countries, using arc layers to connect the organization's headquarters to various countries where it operates.

Key Visualization Features

Arc Layers
Color Coding
Automatic Bounding Box Adjustment
Interactive Elements

Benefits of This Approach

Instant Understanding: Users can immediately grasp complex spatial relationships without the need for lengthy explanations or data tables.
Scalability: The visualization works well for both small and large numbers of related features, adapting to different data scenarios.
Enhanced Data Exploration: By visually connecting related features, users are encouraged to explore the data further and discover patterns or insights that might not be apparent in tabular form.

Lessons Learned

Our journey to implement relations taught us several valuable lessons:

Read vs. Write Tradeoffs: In a mapping application, optimizing for read performance often outweighs write efficiency
User Experience Drives Architecture: Even small performance impacts can significantly affect user perception
Test with Realistic Data: Performance characteristics change dramatically at scale
Be Willing to Adapt: Our initial preference (Option 3) wasn't the right solution for our specific needs

By deeply understanding the technical challenges of implementing relations in a GIS context, we've created a solution that balances performance, flexibility, and usability—making geographic data analysis more powerful and accessible for everyone.

Whether you're tracking tree inspections, monitoring soil measurements, or managing water reservoir systems, Atlas's relations feature provides the foundation for sophisticated spatial analysis while maintaining the performance users expect from a modern GIS platform.

Relations in GIS

The Challenge of Relating Geographic Data

Water Reservoir Management: A Perfect Use Case

Exploring Implementation Options

Option 1: Propagate Updates

Option 2: Store Relation IDs

Option 3: Dedicated Relations Table

Option 4: Hybrid Approach

Our Implementation Journey

The Performance Imperative

Our Final Solution: The Hybrid Approach

Auto-Matching Relations: Simplifying Data Migration

The Results: Performance Where It Matters

Visualizing Relations in the Frontend

Key Visualization Features

Benefits of This Approach

Lessons Learned

Keep reading

How to Map Climate Zones Online

Create Beautiful Maps from Spreadsheets (Google Sheets or Excel)

New in Atlas: May Edition

Ready to level up your map-making process?