Skip to main content

Fuel Stations Gap Filling - Technical Guide

This document covers the technical implementation of fuel station gap-filling using OpenStreetMap Overpass API.

Problem Analysis

Initial Data

  • Source: Loclie export
  • Count: 3,074 stations
  • Coverage: Town-based only
  • Gap: Missing highway/rural stations between towns

Analysis Results

  • 100% of stations had town_id in raw_data
  • All stations were within town boundaries
  • Missing critical highway stations (e.g., roadhouses on Stuart Highway, Eyre Highway)

Strategy

Phase 1: Query OSM for All Australian Fuel Stations

Use OpenStreetMap Overpass API to query by state (avoids timeouts):

const AUSTRALIAN_REGIONS = [
{ name: 'NSW', bbox: { south: -37.5, west: 141.0, north: -28.0, east: 154.0 } },
{ name: 'VIC', bbox: { south: -39.2, west: 140.9, north: -33.9, east: 150.0 } },
{ name: 'QLD', bbox: { south: -29.0, west: 138.0, north: -10.0, east: 154.0 } },
// ... other states
];

const query = `
[out:json][timeout:120];
(
node["amenity"="fuel"](${bbox.south},${bbox.west},${bbox.north},${bbox.east});
way["amenity"="fuel"](${bbox.south},${bbox.west},${bbox.north},${bbox.east});
);
out center;
`;

Phase 2: Deduplication

Key field: osm_id (format: node/XXXXX or way/XXXXX)

Critical Bug Fixed: Supabase has a default 1,000 row limit even without explicit .limit().

❌ Wrong Approach (First Attempt)

// Only gets 1,000 rows!
const { data } = await supabase
.from('fuel_stations')
.select('osm_id');

const existingIds = new Set(data.map(s => s.osm_id));
// Result: Only checked against 1,000 stations, missed 2,074+

✅ Correct Approach (Fixed)

// Paginate to get ALL existing stations
const existingIds = new Set<string>();
let page = 0;
const PAGE_SIZE = 1000;

while (true) {
const { data } = await supabase
.from('fuel_stations')
.select('osm_id')
.range(page * PAGE_SIZE, (page + 1) * PAGE_SIZE - 1);

if (!data || data.length === 0) break;

data.forEach(s => {
if (s.osm_id) existingIds.add(s.osm_id);
});

if (data.length < PAGE_SIZE) break;
page++;
}
// Result: Fetched all 3,074+ existing stations

Phase 3: Within-Batch Deduplication

Critical Bug Fixed: OSM data can contain duplicates (overlapping bounding boxes, duplicate entries).

❌ Wrong Approach (First Attempt)

for (const element of allElements) {
const station = parseOSMElement(element);

// Only checked against database
if (existingIds.has(station.osm_id)) {
skippedDuplicates++;
continue;
}

// BUG: Didn't check for duplicates within this batch!
newStations.push(station);
}
// Result: 94 duplicates within OSM data itself

✅ Correct Approach (Fixed)

const newStationIds = new Set<string>();

for (const element of allElements) {
const station = parseOSMElement(element);

// Check against database
if (existingIds.has(station.osm_id)) {
skippedDuplicates++;
continue;
}

// Check for duplicates within this import batch
if (newStationIds.has(station.osm_id)) {
skippedWithinBatch++;
continue;
}

newStationIds.add(station.osm_id);
newStations.push(station);
}
// Result: Zero duplicates imported

Data Mapping

OSM to Database Schema

OSM TagDatabase ColumnNotes
namenameFallback to brand if no name
brand or operatorbrandBrand/operator name
lat/longeometryPostGIS Point (SRID 4326)
addr:*addressConstructed from components
toilets=yeshas_toiletsAmenity flag
shop=conveniencehas_foodFood availability
hgv=yeshas_truck_accessTruck accessibility
phonephoneContact phone
websitewebsiteWebsite URL
fuel:dieselraw_dataStored in JSONB
fuel:lpgraw_dataStored in JSONB
opening_hoursraw_dataStored in JSONB

Parsing Logic

function parseOSMElement(element: OSMElement) {
const tags = element.tags || {};

// Get coordinates
let lat: number, lon: number;
if (element.type === 'node') {
lat = element.lat;
lon = element.lon;
} else if (element.center) {
lat = element.center.lat;
lon = element.center.lon;
} else {
return null; // No valid coordinates
}

// Get name (fallback to brand)
const name = tags.name || tags.brand || 'Fuel Station';

// Determine facilities
const has_toilets = tags['toilets'] === 'yes';
const has_food = tags['shop'] === 'convenience';
const has_truck_access = tags['hgv'] === 'yes' || tags['hgv'] === 'designated';

// Build OSM ID
const osm_id = `${element.type}/${element.id}`;

return {
geometry: `POINT(${lon} ${lat})`,
name,
brand: tags.brand || tags.operator || null,
has_toilets,
has_food,
has_truck_access,
osm_id,
data_source: 'openstreetmap_overpass',
raw_data: {
osm_tags: tags,
opening_hours: tags.opening_hours,
fuel_types: {
diesel: tags['fuel:diesel'] === 'yes',
lpg: tags['fuel:lpg'] === 'yes',
},
},
};
}

Import Process

Script: import-osm-fuel-stations-gap-fill.ts

Steps:

  1. Fetch ALL existing osm_ids (with pagination)
  2. Query OSM Overpass API for each state
  3. Parse OSM data to database format
  4. Deduplicate against database AND within batch
  5. Batch insert (500 per batch)
  6. Log statistics

Rate Limiting:

// Wait 2 seconds between state queries
await new Promise(resolve => setTimeout(resolve, 2000));

Batch Insertion:

const BATCH_SIZE = 500;
for (let i = 0; i < newStations.length; i += BATCH_SIZE) {
const batch = newStations.slice(i, i + BATCH_SIZE);

const { error } = await supabase
.from('fuel_stations')
.insert(batch);

if (error) {
console.error(`Batch error: ${error.message}`);
}
}

Lessons Learned

1. Supabase Default Limits

Problem: Supabase has a default 1,000 row limit even when .limit() is not specified.

Impact: Only checked 1,000 existing stations when there were 3,074+, causing 3,799 duplicate errors.

Solution: Always paginate with .range() for large datasets.

Takeaway: Never assume unlimited results from Supabase queries.

2. Multi-Source Deduplication

Problem: OSM data can have duplicates from:

  • Overlapping bounding boxes (state queries)
  • Duplicate entries in OSM itself
  • Same physical station with multiple representations

Impact: 94 duplicate stations within OSM data would have caused insertion errors.

Solution: Use Set to track IDs both in database AND current batch.

Takeaway: When importing from multiple sources/queries, deduplicate at multiple levels.

3. Cleanup Strategy

Problem: First buggy import created 1,342 duplicate records.

Solution:

  • Find all OSM IDs that appear more than once
  • Sort by created_at to identify oldest record
  • Keep first (oldest) record, delete rest
  • Process in batches (500 per batch)
  • Verify after cleanup to catch multi-level duplicates

Script: cleanup-duplicate-fuel-stations.ts

// Find duplicates
const osmIdMap = new Map<string, string[]>(); // osm_id -> [station_ids]

// Keep first, delete rest
const duplicates = Array.from(osmIdMap.entries())
.filter(([_, ids]) => ids.length > 1);

duplicates.forEach(([osmId, ids]) => {
const [keepId, ...deleteIds] = ids; // Keep oldest
idsToDelete.push(...deleteIds);
});

// Batch delete
const DELETE_BATCH_SIZE = 500;
for (let i = 0; i < idsToDelete.length; i += DELETE_BATCH_SIZE) {
const batch = idsToDelete.slice(i, i + DELETE_BATCH_SIZE);
await supabase.from('fuel_stations').delete().in('id', batch);
}

Takeaway: Have a cleanup strategy ready when things go wrong. Keep oldest records as they're more likely from primary sources.

4. Rate Limiting External APIs

Implementation: 2-second delay between OSM Overpass API requests.

Reason:

  • OSM Overpass API has rate limits
  • Avoid HTTP 429 (Too Many Requests) errors
  • Be a good API citizen

Takeaway: Always implement rate limiting when querying external APIs.

Results

Second Import (Fixed Version)

📋 Step 1: Fetching existing fuel stations...
📄 Fetched page 1 (1000 stations)...
📄 Fetched page 2 (1000 stations)...
📄 Fetched page 3 (1000 stations)...
📄 Fetched page 4 (1000 stations)...
📄 Fetched page 5 (1000 stations)...
📄 Fetched page 6 (1000 stations)...
📄 Fetched page 7 (574 stations)...
✅ Found 5,238 existing stations in database

📡 Step 2: Querying OpenStreetMap Overpass API...
🌏 NSW: 2,763 stations
🌏 VIC: 2,004 stations
🌏 QLD: 1,783 stations
🌏 SA: 600 stations
🌏 WA: 862 stations
🌏 TAS: 225 stations
🌏 NT: 154 stations
🌏 ACT: 88 stations
📊 Total from OSM: 8,479 stations

🔍 Step 3: Parsing and filtering...
✅ Parsed: 2,193
⚠️ Skipped (already in database): 6,192
⚠️ Skipped (duplicates within OSM data): 94
📦 New stations to import: 2,193

💾 Step 4: Importing to database...
📦 Batch 1/5 (500 stations)... ✅
📦 Batch 2/5 (500 stations)... ✅
📦 Batch 3/5 (500 stations)... ✅
📦 Batch 4/5 (500 stations)... ✅
📦 Batch 5/5 (193 stations)... ✅

Successfully imported: 2,193
📊 Total in database: 8,767 (before cleanup)

After Cleanup

🗑️  Cleanup Complete!
Deleted: 1,342 duplicate records
Remaining: 7,429 unique fuel stations

✅ No duplicate OSM IDs

Verification

Script: verify-fuel-station-coverage.ts

Comprehensive verification checks:

📊 Total Coverage: 7,429 stations

📦 Breakdown by Data Source:
openstreetmap_overpass: 4,355 (58.6%)
openstreetmap: 3,074 (41.4%)

🏘️ Town vs Highway/Rural:
Town-based (Loclie): 3,074 (41.4%)
Highway/Rural (OSM): 4,355 (58.6%)

🔍 Duplicate Detection:
✅ No duplicate OSM IDs found

✨ Data Quality Metrics:
Stations with brand: 6,608 (88.9%)
Stations with website: 1,715 (23.1%)
Stations with phone: 804 (10.8%)
Stations with toilets: 141 (1.9%)
Stations with food: 190 (2.6%)
Stations with truck access: 9 (0.1%)

Best Practices

1. Always Paginate Supabase Queries

// Don't assume unlimited results
while (true) {
const { data } = await supabase
.select('*')
.range(page * PAGE_SIZE, (page + 1) * PAGE_SIZE - 1);

if (!data || data.length < PAGE_SIZE) break;
page++;
}

2. Multi-Level Deduplication

// Check against database
if (existingIds.has(id)) continue;

// Check within current batch
if (newIds.has(id)) continue;

newIds.add(id);

3. Rate Limit External APIs

for (const region of regions) {
await queryAPI(region);
await new Promise(resolve => setTimeout(resolve, 2000));
}

4. Batch Operations

// Insert/delete in batches
for (let i = 0; i < items.length; i += BATCH_SIZE) {
const batch = items.slice(i, i + BATCH_SIZE);
await supabase.from('table').insert(batch);
}

5. Comprehensive Verification

// Always verify after import
- Total count
- Data source breakdown
- Duplicate detection
- Data quality metrics

Commands Reference

# Import Loclie town-based stations
npm run db:import-fuel-stations-loclie

# Gap-fill with OSM (fixed version)
npm run db:import-fuel-stations-osm

# Analyze coverage
npm run db:analyze-fuel-coverage

# Verify data quality
npm run db:verify-fuel-coverage

# Clean up duplicates
npm run db:cleanup-fuel-duplicates