When working with geographic data, relationships between datasets are often as important as the data itself. Imagine tracking tree inspections across a city, monitoring measurements across different acres of farmland, or analyzing water flow in reservoir systems. These scenarios require robust data relationships to provide meaningful insights. At Atlas, we've tackled this challenge head-on, implementing a powerful relations system that connects datasets while maintaining the performance our users expect.
The Challenge of Relating Geographic Data
Geographic data rarely exists in isolation. A tree in a park might have multiple inspections over time. An acre of farmland could have dozens of soil measurements. A water reservoir has inflow and outflow measurements that affect its levels. These connections between primary entities (trees, acres, reservoirs) and their related records (inspections, measurements, flow rates) form the backbone of advanced GIS analysis.
Some key use cases for relations include:
- Connecting tree inspections to specific trees
- Linking soil measurements to particular acres of land
- Tracking water influx and outflow rates for reservoirs
- Calculating aggregated values like average measurements per feature
- Determining time-based insights like "days since last inspection"
While these relationships sound straightforward conceptually, implementing them in a high-performance GIS system presents significant technical challenges.
Water Reservoir Management: A Perfect Use Case
Water reservoir management presents an ideal scenario for demonstrating the power of relations in GIS. Water systems rely on understanding the balance between influx (water entering the system) and outflow (water leaving the system). This balance directly affects reservoir levels, water quality, and system stability.
With relations, water managers can:
- Connect influx measurement points to specific reservoirs
- Track outflow rates at multiple discharge points
- Calculate net water balance for each reservoir
- Monitor seasonal variations in water flow patterns
- Identify potential leaks or unauthorized usage
By connecting measurement data to reservoirs through relations, water managers can create a comprehensive view of the entire water system, enabling better management decisions and more efficient resource allocation.
Exploring Implementation Options
Before diving into development, we explored four potential approaches to implementing relations:
Option 1: Propagate Updates
This approach involves duplicating related data across datasets. When an influx measurement is updated, the changes are automatically propagated to the related reservoir record.
Pros:
- Fast read operations with no joins required
- Simple querying and filtering on related fields
- Straightforward implementation of lookup fields
Cons:
- Complex update logic to propagate changes
- Difficult to track which datasets contain lookup fields
- Performance issues when updating records with many relationships
Option 2: Store Relation IDs
This method stores only the ID of related records, resolving the actual data at query time.
Pros:
- Simple update operations
- Always provides the most current data
- Cleaner data model with less duplication
Cons:
- Slower read operations requiring joins
- Higher database load
- More complex implementation for filtering and sorting
Option 3: Dedicated Relations Table
Creating a separate table to track relationships between datasets.
Pros:
- Flexible and powerful for complex relationships
- Excellent for rollup functions and aggregations
- Clean separation of concerns
Cons:
- Additional table to maintain
- Increased database complexity
- Potential performance impact on reads
Option 4: Hybrid Approach
Combining a relations table with strategic data duplication.
Pros:
- Balances read and write performance
- Supports complex queries and relationships
- Maintains data currency while optimizing common operations
Cons:
- Most complex implementation
- Requires careful management of duplicated data
- Higher maintenance overhead
Our Implementation Journey
We initially gravitated toward Option 3 (Dedicated Relations Table) because it offered the cleanest data model with minimal write overhead. The implementation looked promising:
class Relation:
id *# Unique identifier for the relation*
source_dataset_id *# ID of the source dataset*
source_feature_id *# ID of the source feature*
target_dataset_id *# ID of the target dataset*
target_feature_id *# ID of the target feature*
This model allowed us to track relationships between any two features across different datasets. We could then resolve these relationships at query time, and ensure that the data we displayed to the user was the newest. However, as we tested this approach with real-world data volumes, we encountered performance issues. The additional joins required during tile generation added latency that created a noticeable lag in the user experience.
The Performance Imperative
For a mapping application like Atlas, read performance is critical. When users pan and zoom around a map, they expect tiles to load instantly. Even small delays compound to create a sluggish experience.
Our application streams vector tiles directly from PostGIS, which means any additional processing during tile generation directly impacts the user experience. After extensive testing, we determined that Option 3, while elegant, couldn't meet our performance requirements.
Our Final Solution: The Hybrid Approach
We ultimately implemented Option 4, a hybrid approach that combines a relations table with strategic data duplication. This solution:
- Maintains a dedicated relations table for tracking connections between datasets and features
- Pre-calculates and stores lookup values
- Uses the relations table for complex aggregations and less common queries
- Implements efficient update propagation for changed records
The implementation required more complex code, particularly for managing updates, as we now had to propagate the updates to all the related features when doing such a simple operation as changing a property in the data table. However, these updates can be done async (in the background), and when the update is finished we can notify the frontend with the new values.
This asynchronous update pattern became crucial for maintaining performance while ensuring data consistency. When a user updates a property in a water reservoir record, for example, we need to recalculate all the related lookup fields across potentially dozens of related features. Doing this synchronously would create unacceptable delays in the user interface.
Our solution implements a background worker system that:
- Captures relation update events in a queue
- Processes updates in batches for efficiency
- Intelligently prioritizes updates based on visibility and importance
- Notifies the frontend when updates are complete
This approach allows users to continue working without waiting for all related updates to complete. When updates finish, the system notifies the frontend through WebSocket connections, enabling real-time updates without page refreshes.
Auto-Matching Relations: Simplifying Data Migration
We understand that not all users create datasets from the ground up in Atlas. Many organizations already have extensive datasets with complex relationships established in other systems. For these customers, migrating to a new platform while preserving existing relationships can be a significant challenge.
To address this, we've developed an auto-matching feature for relations, ensuring that users can migrate all their data to Atlas while maintaining existing relationships between datasets. This feature automatically identifies potential relationships between datasets based on matching values in specified columns.
Here's how our auto-matching system works:
def batch_create_relations(source_dataset_id, target_dataset_id, source_column_id, target_column_id): *# Get data from source dataset* source_data = get_dataset_rows(source_dataset_id)
*# Get data from target dataset* target_data = get_dataset_rows(target_dataset_id) *# Extract column values for matching* source_values = extract_column_values(source_data, source_column_id) target_values = extract_column_values(target_data, target_column_id) *# Find matches between datasets* matches = find_matching_values(source_values, target_values) *# Create relations for each match* relations_created = 0 for each match in matches: create_relation( source_dataset_id = source_dataset_id, source_feature_id = match.source_id, target_dataset_id = target_dataset_id, target_feature_id = match.target_id ) relations_created += 1 return relations_created
This auto-matching system handles various data formats, including:
- Simple text or numeric matches
- Multi-value fields (arrays or delimited strings)
- Case-insensitive matching
- Partial matches based on configurable thresholds
For example, a water utility company migrating to Atlas could automatically connect their reservoir dataset with their flow measurement dataset by matching reservoir IDs, preserving the critical relationships needed for water management analysis.
The Results: Performance Where It Matters
The hybrid approach delivered the performance we needed for our mapping application. Tile generation remained fast, with lookup fields available without expensive joins. At the same time, we maintained the flexibility to perform complex aggregations and rollups when needed.
Key benefits of our implementation include:
- Vector tiles load quickly with pre-calculated relationship data
- Updates propagate efficiently to maintain data currency
- Complex queries like "average water influx per reservoir" perform well
- The system scales effectively with increasing data volumes
Visualizing Relations in the Frontend
The implementation of relations in our GIS system wouldn't be complete without an intuitive way to visualize these connections. At Atlas, we've developed a powerful visualization system that helps users instantly understand the relationships between geographic features.
When a user selects a feature in the map that has relations, our system automatically visualizes these connections using arc layers and secondary colors. These curved lines connect the source feature to all its related features, creating an immediate visual understanding of the relationships.
The image provided showcases an example of how relations can be visualized effectively. It displays an organization's operational reach across multiple countries, using arc layers to connect the organization's headquarters to various countries where it operates.
Key Visualization Features
- Arc Layers
- Color Coding
- Automatic Bounding Box Adjustment
- Interactive Elements
Benefits of This Approach
- Instant Understanding: Users can immediately grasp complex spatial relationships without the need for lengthy explanations or data tables.
- Scalability: The visualization works well for both small and large numbers of related features, adapting to different data scenarios.
- Enhanced Data Exploration: By visually connecting related features, users are encouraged to explore the data further and discover patterns or insights that might not be apparent in tabular form.
Lessons Learned
Our journey to implement relations taught us several valuable lessons:
- Read vs. Write Tradeoffs: In a mapping application, optimizing for read performance often outweighs write efficiency
- User Experience Drives Architecture: Even small performance impacts can significantly affect user perception
- Test with Realistic Data: Performance characteristics change dramatically at scale
- Be Willing to Adapt: Our initial preference (Option 3) wasn't the right solution for our specific needs
By deeply understanding the technical challenges of implementing relations in a GIS context, we've created a solution that balances performance, flexibility, and usability—making geographic data analysis more powerful and accessible for everyone.
Whether you're tracking tree inspections, monitoring soil measurements, or managing water reservoir systems, Atlas's relations feature provides the foundation for sophisticated spatial analysis while maintaining the performance users expect from a modern GIS platform.