Skip to main content

Data Quality Improvements

Date: October 15, 2025 Phase: Post-Import Enhancement

Following the successful import of 116,668+ bridge restrictions, three major data quality improvements were implemented to extract hidden data, identify gaps, and establish validation processes.

1. NHVR Property Extraction

Objective

Extract structured dimensional data (height, width, weight) from the nhvr_properties JSONB field where it was stored but not properly parsed during initial import.

Implementation

Discovery: Analyzed 100 sample NHVR records to identify extractable fields:

  • MinHeightClearance - Height data in meters
  • StateTerritory - State/territory codes
  • RoadName - Road names
  • RestrictionType - Restriction categories

Extraction Logic:

// Extract height clearance
if (!record.max_height_meters && props.MinHeightClearance) {
const height = parseFloat(props.MinHeightClearance);
if (!isNaN(height) && height > 0 && height < 50) {
update.max_height_meters = height;

// Auto-flag caravan hazards
if (height < 3.5) update.affects_caravans = true;

// Set severity levels
if (height < 3.0) update.severity = 'danger';
else if (height < 3.5) update.severity = 'caution';
}
}

Results

Processing Status:

  • Total NHVR records: 101,333
  • Records updated: 5,600+ (processing complete)
  • Update rate: ~600-1,000 records per batch with extractable data

Impact:

  • Significant increase in max_height_meters populated fields
  • Improved affects_caravans and severity flagging
  • Better state/territory categorization
  • Enhanced road name coverage

Script: scripts/extract-nhvr-properties.ts

Command:

npm run db:extract-nhvr-properties

2. Victoria Missing Heights Analysis

Objective

Investigate why 4,626 Victorian bridges (27.8%) lack height clearance data and identify strategies to fill gaps.

Findings

Overall Statistics:

  • Total VIC bridges: 16,620
  • With height data: 11,994 (72.2%)
  • Missing height data: 4,626 (27.8%)

Missing Heights by Structure Type:

  • Unknown: Majority of missing data
  • Culverts: Many lack clearance (underground structures)
  • Utility structures: Often don't require clearance measurements
  • Road over road: Some highway overpasses missing data
  • Rail over road: ⚠️ 45 critical overpasses without clearance data

Critical Data Gaps

High Priority:

  • 45 rail overpasses missing height clearances
  • Critical for caravan routing safety
  • Rail overpasses typically have lower clearances (3.0-4.5m range)

Low Priority:

  • Culverts and utility structures (underground, non-blocking)
  • Pedestrian bridges (not vehicle restrictions)
  • Historic structures (may not be on active routes)

Recommendations

  1. Data Source Investigation:

    • VicRoads Bridge Management System
    • Transport for Victoria structure inspections
    • OpenStreetMap maxheight tags (cross-reference available)
    • Manual survey for 45 critical rail locations
  2. Priority Actions:

    • Filter for "RAIL OVER ROAD" bridges without heights
    • Cross-reference with OSM data
    • Mark low-priority structures as non-critical
    • Schedule field verification for critical gaps

Script: scripts/analyze-vic-missing-heights.ts

Command:

npm run db:analyze-vic-gaps

See VIC Rail Overpasses for detailed action plan.

3. OpenStreetMap Validation

Objective

Cross-reference bridge height data with OpenStreetMap's community-maintained maxheight tags to validate accuracy and identify supplementary data sources.

Implementation

Query Design: Uses Overpass API to query OSM data within geographic bounds.

Height Parsing: Handles multiple OSM maxheight formats:

  • Metric: "3.8" → 3.8m
  • Feet: "12'6"" → 3.81m
  • Mixed: "4.5m" → 4.5m

Matching Logic:

  • Geographic proximity: ±0.001° bounding box (~111m radius)
  • Discrepancy threshold: >0.2m difference flagged
  • Returns closest match within radius

Test Results

Test Region: Melbourne Metro Area

  • 617 bridges with maxheight tags found in Melbourne
  • Strong community coverage in urban areas
  • Potential supplementary data source

Recommendations

OSM as Supplementary Source:

Advantages:

  • Extensive community coverage (617+ bridges in Melbourne alone)
  • Regular updates by local contributors
  • Covers local roads not in state datasets

Limitations:

  • Community-maintained (varying accuracy)
  • May be outdated (no inspection dates)
  • Should be flagged as lower confidence than official sources

Use Cases:

  • Fill gaps for VIC's 45 missing rail overpasses
  • Supplement local road coverage
  • Validate suspicious official data

Discrepancy Handling:

  1. Prefer official government sources for state highways
  2. Investigate differences >0.5m
  3. Flag uncertain data with data_source: 'osm_unverified'
  4. Cross-reference multiple sources when available

Script: scripts/validate-osm-crossref.ts

Command:

npm run db:validate-osm

See OSM Integration for implementation guide.

Summary Statistics

Before Data Quality Improvements

  • Total restrictions: 116,668
  • NHVR records with structured height: ~0%
  • VIC rail overpasses unidentified: Unknown
  • OSM validation: Not performed

After Data Quality Improvements

  • Total restrictions: 116,668 (unchanged)
  • NHVR records with extracted height: 5,600+
  • VIC critical gaps identified: 45 rail overpasses
  • OSM bridges available: 617+ (Melbourne alone)

Estimated Coverage Improvement

Current:

  • Additional height clearances extracted: 5,000-10,000
  • Improved caravan safety flagging: 2,000-5,000 records
  • Identified priority data collection targets: 45 structures

With OSM Integration (Future):

  • Potential additional bridges: 10,000+ nationwide
  • Enhanced local road coverage
  • Validation dataset for quality assurance

Next Steps

Immediate Actions

  1. ✅ Complete NHVR property extraction
  2. ⚠️ Focus data collection on 45 VIC rail overpasses
  3. 🔄 Consider OSM import pilot for VIC gaps

Short-term Enhancements (1-2 weeks)

  1. Add data_confidence field to schema
  2. Implement OSM import script for missing VIC rail bridges
  3. Create validation reports comparing OSM vs official data
  4. Add data source tracking for all records

Long-term Strategy (1-3 months)

  1. Schedule regular OSM sync (monthly)
  2. Implement user-reported data corrections
  3. Partner with state authorities for official data updates
  4. Create data quality dashboard showing coverage by region