Fuel Stations Gap Filling - Technical Guide
This document covers the technical implementation of fuel station gap-filling using OpenStreetMap Overpass API.
Problem Analysis
Initial Data
- Source: Loclie export
- Count: 3,074 stations
- Coverage: Town-based only
- Gap: Missing highway/rural stations between towns
Analysis Results
- 100% of stations had
town_idinraw_data - All stations were within town boundaries
- Missing critical highway stations (e.g., roadhouses on Stuart Highway, Eyre Highway)
Strategy
Phase 1: Query OSM for All Australian Fuel Stations
Use OpenStreetMap Overpass API to query by state (avoids timeouts):
const AUSTRALIAN_REGIONS = [
{ name: 'NSW', bbox: { south: -37.5, west: 141.0, north: -28.0, east: 154.0 } },
{ name: 'VIC', bbox: { south: -39.2, west: 140.9, north: -33.9, east: 150.0 } },
{ name: 'QLD', bbox: { south: -29.0, west: 138.0, north: -10.0, east: 154.0 } },
// ... other states
];
const query = `
[out:json][timeout:120];
(
node["amenity"="fuel"](${bbox.south},${bbox.west},${bbox.north},${bbox.east});
way["amenity"="fuel"](${bbox.south},${bbox.west},${bbox.north},${bbox.east});
);
out center;
`;
Phase 2: Deduplication
Key field: osm_id (format: node/XXXXX or way/XXXXX)
Critical Bug Fixed:
Supabase has a default 1,000 row limit even without explicit .limit().
❌ Wrong Approach (First Attempt)
// Only gets 1,000 rows!
const { data } = await supabase
.from('fuel_stations')
.select('osm_id');
const existingIds = new Set(data.map(s => s.osm_id));
// Result: Only checked against 1,000 stations, missed 2,074+
✅ Correct Approach (Fixed)
// Paginate to get ALL existing stations
const existingIds = new Set<string>();
let page = 0;
const PAGE_SIZE = 1000;
while (true) {
const { data } = await supabase
.from('fuel_stations')
.select('osm_id')
.range(page * PAGE_SIZE, (page + 1) * PAGE_SIZE - 1);
if (!data || data.length === 0) break;
data.forEach(s => {
if (s.osm_id) existingIds.add(s.osm_id);
});
if (data.length < PAGE_SIZE) break;
page++;
}
// Result: Fetched all 3,074+ existing stations
Phase 3: Within-Batch Deduplication
Critical Bug Fixed: OSM data can contain duplicates (overlapping bounding boxes, duplicate entries).
❌ Wrong Approach (First Attempt)
for (const element of allElements) {
const station = parseOSMElement(element);
// Only checked against database
if (existingIds.has(station.osm_id)) {
skippedDuplicates++;
continue;
}
// BUG: Didn't check for duplicates within this batch!
newStations.push(station);
}
// Result: 94 duplicates within OSM data itself
✅ Correct Approach (Fixed)
const newStationIds = new Set<string>();
for (const element of allElements) {
const station = parseOSMElement(element);
// Check against database
if (existingIds.has(station.osm_id)) {
skippedDuplicates++;
continue;
}
// Check for duplicates within this import batch
if (newStationIds.has(station.osm_id)) {
skippedWithinBatch++;
continue;
}
newStationIds.add(station.osm_id);
newStations.push(station);
}
// Result: Zero duplicates imported
Data Mapping
OSM to Database Schema
| OSM Tag | Database Column | Notes |
|---|---|---|
name | name | Fallback to brand if no name |
brand or operator | brand | Brand/operator name |
lat/lon | geometry | PostGIS Point (SRID 4326) |
addr:* | address | Constructed from components |
toilets=yes | has_toilets | Amenity flag |
shop=convenience | has_food | Food availability |
hgv=yes | has_truck_access | Truck accessibility |
phone | phone | Contact phone |
website | website | Website URL |
fuel:diesel | raw_data | Stored in JSONB |
fuel:lpg | raw_data | Stored in JSONB |
opening_hours | raw_data | Stored in JSONB |
Parsing Logic
function parseOSMElement(element: OSMElement) {
const tags = element.tags || {};
// Get coordinates
let lat: number, lon: number;
if (element.type === 'node') {
lat = element.lat;
lon = element.lon;
} else if (element.center) {
lat = element.center.lat;
lon = element.center.lon;
} else {
return null; // No valid coordinates
}
// Get name (fallback to brand)
const name = tags.name || tags.brand || 'Fuel Station';
// Determine facilities
const has_toilets = tags['toilets'] === 'yes';
const has_food = tags['shop'] === 'convenience';
const has_truck_access = tags['hgv'] === 'yes' || tags['hgv'] === 'designated';
// Build OSM ID
const osm_id = `${element.type}/${element.id}`;
return {
geometry: `POINT(${lon} ${lat})`,
name,
brand: tags.brand || tags.operator || null,
has_toilets,
has_food,
has_truck_access,
osm_id,
data_source: 'openstreetmap_overpass',
raw_data: {
osm_tags: tags,
opening_hours: tags.opening_hours,
fuel_types: {
diesel: tags['fuel:diesel'] === 'yes',
lpg: tags['fuel:lpg'] === 'yes',
},
},
};
}
Import Process
Script: import-osm-fuel-stations-gap-fill.ts
Steps:
- Fetch ALL existing osm_ids (with pagination)
- Query OSM Overpass API for each state
- Parse OSM data to database format
- Deduplicate against database AND within batch
- Batch insert (500 per batch)
- Log statistics
Rate Limiting:
// Wait 2 seconds between state queries
await new Promise(resolve => setTimeout(resolve, 2000));
Batch Insertion:
const BATCH_SIZE = 500;
for (let i = 0; i < newStations.length; i += BATCH_SIZE) {
const batch = newStations.slice(i, i + BATCH_SIZE);
const { error } = await supabase
.from('fuel_stations')
.insert(batch);
if (error) {
console.error(`Batch error: ${error.message}`);
}
}
Lessons Learned
1. Supabase Default Limits
Problem: Supabase has a default 1,000 row limit even when .limit() is not specified.
Impact: Only checked 1,000 existing stations when there were 3,074+, causing 3,799 duplicate errors.
Solution: Always paginate with .range() for large datasets.
Takeaway: Never assume unlimited results from Supabase queries.
2. Multi-Source Deduplication
Problem: OSM data can have duplicates from:
- Overlapping bounding boxes (state queries)
- Duplicate entries in OSM itself
- Same physical station with multiple representations
Impact: 94 duplicate stations within OSM data would have caused insertion errors.
Solution: Use Set to track IDs both in database AND current batch.
Takeaway: When importing from multiple sources/queries, deduplicate at multiple levels.
3. Cleanup Strategy
Problem: First buggy import created 1,342 duplicate records.
Solution:
- Find all OSM IDs that appear more than once
- Sort by
created_atto identify oldest record - Keep first (oldest) record, delete rest
- Process in batches (500 per batch)
- Verify after cleanup to catch multi-level duplicates
Script: cleanup-duplicate-fuel-stations.ts
// Find duplicates
const osmIdMap = new Map<string, string[]>(); // osm_id -> [station_ids]
// Keep first, delete rest
const duplicates = Array.from(osmIdMap.entries())
.filter(([_, ids]) => ids.length > 1);
duplicates.forEach(([osmId, ids]) => {
const [keepId, ...deleteIds] = ids; // Keep oldest
idsToDelete.push(...deleteIds);
});
// Batch delete
const DELETE_BATCH_SIZE = 500;
for (let i = 0; i < idsToDelete.length; i += DELETE_BATCH_SIZE) {
const batch = idsToDelete.slice(i, i + DELETE_BATCH_SIZE);
await supabase.from('fuel_stations').delete().in('id', batch);
}
Takeaway: Have a cleanup strategy ready when things go wrong. Keep oldest records as they're more likely from primary sources.
4. Rate Limiting External APIs
Implementation: 2-second delay between OSM Overpass API requests.
Reason:
- OSM Overpass API has rate limits
- Avoid HTTP 429 (Too Many Requests) errors
- Be a good API citizen
Takeaway: Always implement rate limiting when querying external APIs.
Results
Second Import (Fixed Version)
📋 Step 1: Fetching existing fuel stations...
📄 Fetched page 1 (1000 stations)...
📄 Fetched page 2 (1000 stations)...
📄 Fetched page 3 (1000 stations)...
📄 Fetched page 4 (1000 stations)...
📄 Fetched page 5 (1000 stations)...
📄 Fetched page 6 (1000 stations)...
📄 Fetched page 7 (574 stations)...
✅ Found 5,238 existing stations in database
📡 Step 2: Querying OpenStreetMap Overpass API...
🌏 NSW: 2,763 stations
🌏 VIC: 2,004 stations
🌏 QLD: 1,783 stations
🌏 SA: 600 stations
🌏 WA: 862 stations
🌏 TAS: 225 stations
🌏 NT: 154 stations
🌏 ACT: 88 stations
📊 Total from OSM: 8,479 stations
🔍 Step 3: Parsing and filtering...
✅ Parsed: 2,193
⚠️ Skipped (already in database): 6,192
⚠️ Skipped (duplicates within OSM data): 94
📦 New stations to import: 2,193
💾 Step 4: Importing to database...
📦 Batch 1/5 (500 stations)... ✅
📦 Batch 2/5 (500 stations)... ✅
📦 Batch 3/5 (500 stations)... ✅
📦 Batch 4/5 (500 stations)... ✅
📦 Batch 5/5 (193 stations)... ✅
Successfully imported: 2,193
📊 Total in database: 8,767 (before cleanup)
After Cleanup
🗑️ Cleanup Complete!
Deleted: 1,342 duplicate records
Remaining: 7,429 unique fuel stations
✅ No duplicate OSM IDs
Verification
Script: verify-fuel-station-coverage.ts
Comprehensive verification checks:
📊 Total Coverage: 7,429 stations
📦 Breakdown by Data Source:
openstreetmap_overpass: 4,355 (58.6%)
openstreetmap: 3,074 (41.4%)
🏘️ Town vs Highway/Rural:
Town-based (Loclie): 3,074 (41.4%)
Highway/Rural (OSM): 4,355 (58.6%)
🔍 Duplicate Detection:
✅ No duplicate OSM IDs found
✨ Data Quality Metrics:
Stations with brand: 6,608 (88.9%)
Stations with website: 1,715 (23.1%)
Stations with phone: 804 (10.8%)
Stations with toilets: 141 (1.9%)
Stations with food: 190 (2.6%)
Stations with truck access: 9 (0.1%)
Best Practices
1. Always Paginate Supabase Queries
// Don't assume unlimited results
while (true) {
const { data } = await supabase
.select('*')
.range(page * PAGE_SIZE, (page + 1) * PAGE_SIZE - 1);
if (!data || data.length < PAGE_SIZE) break;
page++;
}
2. Multi-Level Deduplication
// Check against database
if (existingIds.has(id)) continue;
// Check within current batch
if (newIds.has(id)) continue;
newIds.add(id);
3. Rate Limit External APIs
for (const region of regions) {
await queryAPI(region);
await new Promise(resolve => setTimeout(resolve, 2000));
}
4. Batch Operations
// Insert/delete in batches
for (let i = 0; i < items.length; i += BATCH_SIZE) {
const batch = items.slice(i, i + BATCH_SIZE);
await supabase.from('table').insert(batch);
}
5. Comprehensive Verification
// Always verify after import
- Total count
- Data source breakdown
- Duplicate detection
- Data quality metrics
Commands Reference
# Import Loclie town-based stations
npm run db:import-fuel-stations-loclie
# Gap-fill with OSM (fixed version)
npm run db:import-fuel-stations-osm
# Analyze coverage
npm run db:analyze-fuel-coverage
# Verify data quality
npm run db:verify-fuel-coverage
# Clean up duplicates
npm run db:cleanup-fuel-duplicates
Related Documents
- Overview - Implementation summary and results
- Database Overview - Main database documentation