AI Twin for Data Center Switchgear: Predict remaining useful life (RUL) before failures happen
Dataknobs built an AI Twin on operational switchgear telemetry to convert noisy power signals into an actionable Health Index and then estimate Remaining Useful Life. The approach is designed for high‑reliability assets where historical failures are rare—making classic “predict failure” models ineffective.
Predictive maintenance without run-to-failure data
Switchgear in modern data centers is engineered to be highly reliable. That’s great for uptime—but it creates a data science problem: there aren’t enough failures to train a standard supervised failure model. Teams still need a way to identify early degradation, prioritize interventions, and reduce unplanned outages.
Pivot from failure prediction to degradation prediction
Instead of predicting the rare event (failure), the AI Twin predicts a proxy for degradation: a composite Health Index that summarizes stability, anomalies, and efficiency. The system then converts Health Index trajectories into an RUL estimate using transparent rules.
Domain-first: focus on signals that correlate with health
Key telemetry streams used for condition assessment include:
- Current: average and per-phase currents
- Power: active (kW), reactive (kVAR), and apparent (kVA)
- Power factor: total and per-phase (phase imbalance)
- Voltage: line-to-line and phase-to-neutral
- Uptime: availability trends over time
Engineering hypotheses guide feature selection—e.g., low/declining power factor and sustained phase imbalance can be early indicators of inefficiency or emerging faults.
A “virtual sensor” for degradation (1–10)
The Health Index converts many noisy signals into a single interpretable score: 10 = healthy, 1 = critically degraded.
How it’s built
Telemetry is aggregated (daily/weekly/monthly), enriched with rate-of-change and moving averages, and then statistically combined into a score that weights anomalies—especially recent ones.
Why this works when failures are rare
You can observe “unhealthy behavior” far more often than outright failure. Predicting health states turns the problem into supervised ordinal classification with far more training signal.
What operators do with it
Use thresholds (e.g., predicted to enter Class 3 or below next month) to trigger inspection, targeted maintenance, or deeper diagnostics—before an outage.
Ordinal classification + risk-aware scoring
This scoring aligns optimization with operational risk: misclassifying a degraded asset as “healthy” is far worse than being off by a single class.
Translate predicted health into Remaining Useful Life
RUL is computed as a function of baseline expected lifetime, current asset age, and a configurable “life loss” derived from health history:
RUL = (Baseline Lifetime) − (Current Age) − (Predicted Life Loss)
The translation is intentionally rule-based for transparency—engineers can trace a low RUL back to the health trajectory and the life-loss rules.
Cloud-native pipeline for monitoring + inference
Training & model management
- Train and serialize models (e.g.,
.pklartifacts) - Store versioned models in a cloud bucket for controlled deployment
- Temporal splits reduce time-series leakage and mimic real operations
Inference & visualization
- Feature engineering applied consistently at inference time
- Predicted Health Index + RUL stored in
BigQuery - Dashboards in
Lookerfor daily/weekly/monthly views
Visual walkthrough (from the deck)
These images are embedded directly from the provided slide exports.
Bring AI Twins to critical infrastructure
If you want an AI Twin for switchgear, UPS, generators, chillers, or other critical assets, Dataknobs can help: from data ingestion and health scoring to RUL forecasting and operator dashboards.
Contact: Dataknobs • Add your preferred email/CTA link here