The Core Problem
The most common trap is **confounding**, where a hidden "third variable" causes two other variables to move together, creating a spurious correlation.
The Gold Standard: RCTs
Randomization is the most powerful tool. It creates two groups that are, on average, identical, breaking the links to any potential confounders. Any difference in outcome can then be attributed to the treatment.
Population
Treatment Group
Control Group
Difference-in-Differences
Compares the change in outcomes over time between a treated group and an untreated group. Relies on the "parallel trends" assumption.
Regression Discontinuity
Used when a treatment is assigned by a sharp cutoff. It compares people just above and below the cutoff, assuming they are otherwise identical.
More Tools
Instrumental Variables (IV)
Uses a third variable (the instrument) that affects treatment choice but not the outcome directly, isolating a sliver of "as-if random" assignment.
Propensity Score Matching (PSM)
Creates a comparable control group by matching treated individuals to untreated individuals who had a similar likelihood (propensity) of being treated.
The Unspoken Rules
All causal claims from non-experimental data rely on strong, untestable assumptions. These must be justified with domain knowledge.
🤝
SUTVA
No interference between units and no hidden versions of the treatment.
🔍
Unconfoundedness
All variables that affect both treatment and outcome have been measured and controlled for.
✅
Positivity
For any type of person, there is some chance of being in either the treatment or control group.