Dataknobs • Case Study
AI Twin • Switchgear • Data Centers

AI Twin for Data Center Switchgear: Predict remaining useful life (RUL) before failures happen

Dataknobs built an AI Twin on operational switchgear telemetry to convert noisy power signals into an actionable Health Index and then estimate Remaining Useful Life. The approach is designed for high‑reliability assets where historical failures are rare—making classic “predict failure” models ineffective.

Health Index (1–10) XGBoost / Decision Trees Ranked-distance metric BigQuery + Looker dashboards Condition-based maintenance
The challenge

Predictive maintenance without run-to-failure data

Switchgear in modern data centers is engineered to be highly reliable. That’s great for uptime—but it creates a data science problem: there aren’t enough failures to train a standard supervised failure model. Teams still need a way to identify early degradation, prioritize interventions, and reduce unplanned outages.

Why it’s hard
Rare failures → sparse labels → traditional “predict failure” ML underperforms or overfits.
What operators need
A simple, explainable way to monitor condition, forecast deterioration, and plan maintenance windows.
The solution

Pivot from failure prediction to degradation prediction

Instead of predicting the rare event (failure), the AI Twin predicts a proxy for degradation: a composite Health Index that summarizes stability, anomalies, and efficiency. The system then converts Health Index trajectories into an RUL estimate using transparent rules.

1) Sense
Use switchgear telemetry (current, voltage, power metrics, power factor, uptime).
2) Understand
Engineer features that quantify anomalies by number, magnitude, frequency, and recency.
3) Act
Forecast Health Index states; translate into RUL to prioritize maintenance.
Signals & hypotheses

Domain-first: focus on signals that correlate with health

Key telemetry streams used for condition assessment include:

  • Current: average and per-phase currents
  • Power: active (kW), reactive (kVAR), and apparent (kVA)
  • Power factor: total and per-phase (phase imbalance)
  • Voltage: line-to-line and phase-to-neutral
  • Uptime: availability trends over time

Engineering hypotheses guide feature selection—e.g., low/declining power factor and sustained phase imbalance can be early indicators of inefficiency or emerging faults.

Health Index

A “virtual sensor” for degradation (1–10)

The Health Index converts many noisy signals into a single interpretable score: 10 = healthy, 1 = critically degraded.

How it’s built

Telemetry is aggregated (daily/weekly/monthly), enriched with rate-of-change and moving averages, and then statistically combined into a score that weights anomalies—especially recent ones.

Why this works when failures are rare

You can observe “unhealthy behavior” far more often than outright failure. Predicting health states turns the problem into supervised ordinal classification with far more training signal.

What operators do with it

Use thresholds (e.g., predicted to enter Class 3 or below next month) to trigger inspection, targeted maintenance, or deeper diagnostics—before an outage.

Predictive modeling

Ordinal classification + risk-aware scoring

Model framing
Treat Health Index as discrete classes (1–10) to create actionable states.
Algorithms
Decision Trees and XGBoost as top performers; tuned with grid search.
Evaluation
Ranked-distance metric penalizes “far off” predictions more heavily.

This scoring aligns optimization with operational risk: misclassifying a degraded asset as “healthy” is far worse than being off by a single class.

RUL estimation

Translate predicted health into Remaining Useful Life

RUL is computed as a function of baseline expected lifetime, current asset age, and a configurable “life loss” derived from health history:

RUL = (Baseline Lifetime) − (Current Age) − (Predicted Life Loss)

The translation is intentionally rule-based for transparency—engineers can trace a low RUL back to the health trajectory and the life-loss rules.

Deployment architecture

Cloud-native pipeline for monitoring + inference

Training & model management

  • Train and serialize models (e.g., .pkl artifacts)
  • Store versioned models in a cloud bucket for controlled deployment
  • Temporal splits reduce time-series leakage and mimic real operations

Inference & visualization

  • Feature engineering applied consistently at inference time
  • Predicted Health Index + RUL stored in BigQuery
  • Dashboards in Looker for daily/weekly/monthly views
Slide gallery

Visual walkthrough (from the deck)

These images are embedded directly from the provided slide exports.

Slide 1: AI Twin for Switchgear in Data Center
Slide 1
Slide 2: AI Twin for Switchgear in Data Center
Slide 2
Slide 3: AI Twin for Switchgear in Data Center
Slide 3
Slide 4: AI Twin for Switchgear in Data Center
Slide 4
Slide 5: AI Twin for Switchgear in Data Center
Slide 5
Slide 6: AI Twin for Switchgear in Data Center
Slide 6
Slide 7: AI Twin for Switchgear in Data Center
Slide 7
Slide 8: AI Twin for Switchgear in Data Center
Slide 8
Slide 9: AI Twin for Switchgear in Data Center
Slide 9
Slide 10: AI Twin for Switchgear in Data Center
Slide 10
Slide 11: AI Twin for Switchgear in Data Center
Slide 11
Slide 12: AI Twin for Switchgear in Data Center
Slide 12
Slide 13: AI Twin for Switchgear in Data Center
Slide 13
Slide 14: AI Twin for Switchgear in Data Center
Slide 14
Slide 15: AI Twin for Switchgear in Data Center
Slide 15
Slide 16: AI Twin for Switchgear in Data Center
Slide 16
Slide 17: AI Twin for Switchgear in Data Center
Slide 17
Next steps

Bring AI Twins to critical infrastructure

If you want an AI Twin for switchgear, UPS, generators, chillers, or other critical assets, Dataknobs can help: from data ingestion and health scoring to RUL forecasting and operator dashboards.

Discovery
Identify signals, data quality gaps, and engineering hypotheses.
Prototype
Health Index + baseline model in weeks; validate with SMEs.
Production
MLOps, monitoring, and dashboards for ongoing operations.

Contact: DataknobsAdd your preferred email/CTA link here