Using AI/ML to Estimate Waterflow When Sensors Fail
Saving Lives and Reducing Infrastructure Costs
Emergency response teams need accurate, real-time waterflow data to determine which areas are at flood risk during storms and floods. However, the existing infrastructure for collecting this critical data faces significant operational and financial challenges.
The USA government has invested $20 billion to install stream gages across the country. Each gage costs approximately $50,000, with an additional $200 million in annual operational costs to maintain the network.
Gages break frequently, especially during the flood and storm events when they're needed most. Floating trees and debris in the river often damage or destroy the sensors.
During flood and storm emergencies, emergency response teams urgently need waterflow data to determine which areas are at risk. But this is exactly when gages are most likely to fail or be damaged.
Predictive maintenance for gages is not appropriate::they fail suddenly and unpredictably in extreme conditions. Forecasting waterflow directly is extremely difficult due to complex dependencies on weather, snow, temperature, and dam operations.
The problem isn't predicting when gages will fail or forecasting waterflow independently. Instead, it's maintaining situational awareness of waterflow even when individual sensors fail. The solution: use machine learning to estimate waterflow at broken gages based on data from nearby, functional gages.
Rather than trying to predict gage failures or absolute waterflow, we use machine learning to leverage correlations between nearby gages. When a gage fails, we can accurately estimate its waterflow using data from correlated gages in the same geographic cluster.
Government agencies obtain over 90% accurate waterflow information even when gages fail, providing the reliable data needed for emergency response decisions.
Organizations save substantial money on emergency repairs and operational costs by maintaining system visibility without relying on perfect sensor availability.
Emergency responders can make informed decisions about flood risk even when sensors fail. During dangerous periods, no team needs to risk lives replacing broken gages.
The solution doesn't require replacing or maintaining all gages perfectly. It works with the gages you have, mitigating failures through intelligent estimation.
In Alaska and other flood-prone areas where gages break frequently during winter and storm seasons, teams no longer need to risk lives replacing sensors during dangerous conditions. The ML system estimates waterflow reliably, allowing emergency response decisions to proceed even when gages are offline.
The approach to building an effective ML model depends critically on the patterns visible in your data. Clear patterns require simple models; subtle patterns demand more sophisticated approaches.
Data Characteristics: Clear, distinct patterns are visible
Requirements: Need less data, fewer features, simple models
Expected Outcome: Highly accurate results with straightforward approaches
Example: Strong seasonal waterflow patterns with clear peaks and valleys
Data Characteristics: Patterns exist but are subtle or multi-layered
Requirements: Need more historical data, additional features, complex models
Expected Outcome: Good accuracy with sophisticated modeling approaches
Example: Waterflow influenced by multiple correlated factors (weather, temperature, reservoir levels)
Data Characteristics: Patterns are highly complex or non-linear
Requirements: Different algorithms, many features, complex or deep learning models
Expected Outcome: Moderate accuracy::inherent complexity limits predictability
Example: Extreme events, unusual weather combinations with unprecedented impacts
The complexity of your patterns determines your approach. Don't assume you need deep learning or complex models. Start with data exploration to understand pattern complexity. Obvious patterns yield to simple, interpretable models. More complex patterns may require sophisticated approaches, but may also have inherent limits to predictability. Understand your data before choosing your model.
The ML model architecture consists of four integrated phases: data gathering with correlation analysis, machine learning model training, handling flooding events, and real-time prediction. This systematic approach ensures accuracy and reliability when it matters most.
Compute once per year: Analyze stream data from NWIS (National Water Information System). Create clusters of gages based on time series similarity. Identify gages that show similar surge or decline patterns in waterflow. Highlight geographic areas where clustering doesn't exist (potential coverage gaps).
Output: Gage clusters with strong correlations ready for modeling
Compute once per year: Build machine learning models for all gages using SHEM (Streamgage Hydrologic Estimate Using Machine Learning). Identify which gages are easy to predict (strong correlations) versus difficult to predict (weak correlations, isolated locations). Train reliability models on historical data and re-test on multiple datasets.
Output: Trained models with confidence scores for each gage
Real-time execution: When a stream gage fails, identify working gages in the same cluster. If needed, retrain the model on the fly using recent data. This ensures predictions adapt to current conditions rather than relying solely on historical patterns.
Output: Adaptive models responsive to current conditions
On-demand execution: Predict waterflow for broken gages using working gages in the cluster. Generate valuable insights about which additional models or features should be built to improve coverage and accuracy. Continuously improve the system based on prediction performance.
Output: Real-time waterflow estimates for emergency response teams
The ML model produces varying levels of accuracy across different gages, with clear patterns explaining why some locations are easier to predict than others. Understanding these patterns guides further model improvement.
Performance: Excellent correlation between predicted and actual waterflow
Explanation: Strong geographic/hydrologic correlations with nearby gages enable accurate estimation
Characteristics: Gages with multiple similar neighbors in the cluster
Use Case: Safe to use for emergency response decisions
Performance: Accurate predictions for gradual changes, struggles with sharp declines
Explanation: Models capture general trends but miss rapid state changes
Characteristics: Normal conditions well-modeled, extreme events less accurate
Use Case: Use with caution during extreme weather; supplement with additional features
Performance: Moderate accuracy overall
Explanation: Limited training data, weak correlations with nearby gages, or unusual local conditions
Improvement Path: Candidate for deep learning approaches or additional feature engineering
Action Items: Collect more historical data, identify additional predictive features (weather, elevation, vegetation)
Even with moderate accuracy on some gages, the system provides decision support for emergency teams. >90% accuracy on easily predictable gages covers many critical locations. For difficult-to-predict locations, the system flags risk levels and uncertainty, enabling teams to take appropriate precautions. Combined with manual inspection where possible, the ML estimates significantly improve situational awareness during floods and storms.
Implementing a predictive monitoring system requires a systematic approach combining technical development, operational integration, and continuous improvement.
Understand your data. Analyze historical waterflow patterns at each gage. Identify clusters of gages with similar patterns. Assess pattern complexity (obvious vs. subtle). This exploratory phase determines whether simple or complex models are appropriate.
Use time series similarity metrics to identify gage clusters. Compute correlations between gages within each cluster. Identify gaps where clustering is weak or missing. Document which gages will be easy vs. difficult to predict.
Start simple. Begin with straightforward regression or time series models for easily predictable gages. Use more complex models only where justified by data complexity. Train models annually on full historical dataset. Test on holdout data to estimate real-world performance.
Integrate with emergency response systems. Provide real-time waterflow estimates when gages fail. Display confidence scores to help teams understand reliability. Set up alerts when key gages become unavailable. Train emergency response teams on using estimates.
Monitor prediction accuracy in production. Identify systematically mispredict cases. Collect additional features that might improve accuracy. Iteratively enhance models. Expand to difficult-to-predict locations with targeted data collection and feature engineering.
Start with Data Understanding: Don't assume you need complex models. Explore your data first.
Leverage Correlations: Geographic and hydrologic correlations are more predictable than absolute values.
Identify Easy vs Hard: Segment gages by predictability. Use simpler approaches where they work.
Build Iteratively: Start with 80% of locations, improve to 90%, then tackle difficult remaining cases.
Integrate Operationally: The best model is only valuable if it's actually used by emergency responders.
Predictive monitoring for stream gages delivers substantial value across financial, operational, and human dimensions.
For a hypothetical system with 1000 gages: ~10-15% experience failures annually (~100-150 gages). Emergency replacement costs: ~$50K each = $5-7.5M annually. ML system development cost: ~$2-3M. Payback period: <1 year. Plus: Lives saved through improved emergency response (priceless).
Predictive monitoring demonstrates how ML solves real operational problems. The challenge isn't forecasting the future perfectly::it's maintaining situational awareness despite imperfect sensors. By leveraging correlations between nearby sensors, you can estimate missing data with high accuracy.
The key to success is pragmatism. Start with data exploration. Understand what patterns exist in your data and whether they're obvious or subtle. Build simple models where they suffice. Only adopt complex approaches when justified. Integrate with operations to ensure the predictions are actually used.
The value is substantial and multifaceted. Reduce costs by avoiding emergency repairs. Improve safety by eliminating risky sensor replacement. Enhance emergency response with reliable data when it matters most. And perhaps most importantly: save lives by ensuring decision-makers have the information they need during critical events.
Predictive monitoring isn't theoretical::it's practical, deployable, and impactful today. Whether you're managing water systems, power grids, infrastructure networks, or any distributed monitoring system, these principles apply. The pattern holds: use ML to estimate missing sensor data from correlated neighbors, maintain situational awareness despite failures, and deliver value when it's needed most.