AI Twin • Switchgear • Data Centers

AI Twin for Data Center Switchgear: Predict remaining useful life (RUL) before failures happen

Dataknobs built an AI Twin on operational switchgear telemetry to convert noisy power signals into an actionable Health Index and then estimate Remaining Useful Life. The approach is designed for high‑reliability assets where historical failures are rare—making classic “predict failure” models ineffective.

Talk to Dataknobs View the slide gallery

Health Index (1–10) XGBoost / Decision Trees Ranked-distance metric BigQuery + Looker dashboards Condition-based maintenance

Snapshot

Asset Fleet

390+ units

ABB switchgear across data centers

Core Output

Health Index

Single score to summarize degradation signals

Outcome

RUL estimate

Rules translate health history into remaining life

What “AI Twin” means here

A virtual representation of each switchgear unit that continuously learns from telemetry, predicts near-future health states, and converts that into maintenance-ready insights.

The challenge

Predictive maintenance without run-to-failure data

Switchgear in modern data centers is engineered to be highly reliable. That’s great for uptime—but it creates a data science problem: there aren’t enough failures to train a standard supervised failure model. Teams still need a way to identify early degradation, prioritize interventions, and reduce unplanned outages.

Why it’s hard

Rare failures → sparse labels → traditional “predict failure” ML underperforms or overfits.

What operators need

A simple, explainable way to monitor condition, forecast deterioration, and plan maintenance windows.

The solution

Pivot from failure prediction to degradation prediction

Instead of predicting the rare event (failure), the AI Twin predicts a proxy for degradation: a composite Health Index that summarizes stability, anomalies, and efficiency. The system then converts Health Index trajectories into an RUL estimate using transparent rules.

1) Sense

Use switchgear telemetry (current, voltage, power metrics, power factor, uptime).

2) Understand

Engineer features that quantify anomalies by number, magnitude, frequency, and recency.

3) Act

Forecast Health Index states; translate into RUL to prioritize maintenance.

Signals & hypotheses

Domain-first: focus on signals that correlate with health

Key telemetry streams used for condition assessment include:

Current: average and per-phase currents
Power: active (kW), reactive (kVAR), and apparent (kVA)
Power factor: total and per-phase (phase imbalance)
Voltage: line-to-line and phase-to-neutral
Uptime: availability trends over time

Engineering hypotheses guide feature selection—e.g., low/declining power factor and sustained phase imbalance can be early indicators of inefficiency or emerging faults.

Health Index

A “virtual sensor” for degradation (1–10)

The Health Index converts many noisy signals into a single interpretable score: 10 = healthy, 1 = critically degraded.

How it’s built

Telemetry is aggregated (daily/weekly/monthly), enriched with rate-of-change and moving averages, and then statistically combined into a score that weights anomalies—especially recent ones.

Why this works when failures are rare

You can observe “unhealthy behavior” far more often than outright failure. Predicting health states turns the problem into supervised ordinal classification with far more training signal.

What operators do with it

Use thresholds (e.g., predicted to enter Class 3 or below next month) to trigger inspection, targeted maintenance, or deeper diagnostics—before an outage.

Predictive modeling

Ordinal classification + risk-aware scoring

Model framing

Treat Health Index as discrete classes (1–10) to create actionable states.

Algorithms

Decision Trees and XGBoost as top performers; tuned with grid search.

Evaluation

Ranked-distance metric penalizes “far off” predictions more heavily.

This scoring aligns optimization with operational risk: misclassifying a degraded asset as “healthy” is far worse than being off by a single class.

RUL estimation

Translate predicted health into Remaining Useful Life

RUL is computed as a function of baseline expected lifetime, current asset age, and a configurable “life loss” derived from health history:

RUL = (Baseline Lifetime) − (Current Age) − (Predicted Life Loss)

The translation is intentionally rule-based for transparency—engineers can trace a low RUL back to the health trajectory and the life-loss rules.

Deployment architecture

Cloud-native pipeline for monitoring + inference

Training & model management

Train and serialize models (e.g., .pkl artifacts)
Store versioned models in a cloud bucket for controlled deployment
Temporal splits reduce time-series leakage and mimic real operations

Inference & visualization

Feature engineering applied consistently at inference time
Predicted Health Index + RUL stored in BigQuery
Dashboards in Looker for daily/weekly/monthly views

Slide gallery

Visual walkthrough (from the deck)

These images are embedded directly from the provided slide exports.

Slide 1: AI Twin for Switchgear in Data Center — Slide 1

Slide 2: AI Twin for Switchgear in Data Center — Slide 2

Slide 3: AI Twin for Switchgear in Data Center — Slide 3

Slide 4: AI Twin for Switchgear in Data Center — Slide 4

Slide 5: AI Twin for Switchgear in Data Center — Slide 5

Slide 6: AI Twin for Switchgear in Data Center — Slide 6

Slide 7: AI Twin for Switchgear in Data Center — Slide 7

Slide 8: AI Twin for Switchgear in Data Center — Slide 8

Slide 9: AI Twin for Switchgear in Data Center — Slide 9

Slide 10: AI Twin for Switchgear in Data Center — Slide 10

Slide 11: AI Twin for Switchgear in Data Center — Slide 11

Slide 12: AI Twin for Switchgear in Data Center — Slide 12

Slide 13: AI Twin for Switchgear in Data Center — Slide 13

Slide 14: AI Twin for Switchgear in Data Center — Slide 14

Slide 15: AI Twin for Switchgear in Data Center — Slide 15

Slide 16: AI Twin for Switchgear in Data Center — Slide 16

Slide 17: AI Twin for Switchgear in Data Center — Slide 17

Next steps

Bring AI Twins to critical infrastructure

If you want an AI Twin for switchgear, UPS, generators, chillers, or other critical assets, Dataknobs can help: from data ingestion and health scoring to RUL forecasting and operator dashboards.

Discovery

Identify signals, data quality gaps, and engineering hypotheses.

Prototype

Health Index + baseline model in weeks; validate with SMEs.

Production

MLOps, monitoring, and dashboards for ongoing operations.

Contact: Dataknobs • Add your preferred email/CTA link here