The Quest for "Why?"

A visual guide to the foundations of causal inference—moving from simply seeing a pattern to proving a cause.

1. The Fundamental Problem

For any person, we can only observe one reality. The alternative—the **counterfactual**—is forever hidden.

🧑‍🔬

Person A

Gets Treatment 💊

Outcome is Observed

No Treatment 🚫

Outcome is Unobserved

2. The Goal: ATE

Since we can't see individual effects, we estimate the **Average Treatment Effect (ATE)** across a population.

E [Y(1)] - E [Y(0)]

Average Outcome (Treated) - Average Outcome (Control)

3. The Gold Standard: Randomization

Random Sampling

🌍

Select a representative group from the population.
(External Validity)

Random Assignment

⚖️

Split the sample into two identical-on-average groups.
(Internal Validity)

This ensures the only difference between groups is the treatment itself.

4. The Threat: Selection Bias

Without randomization, groups often form based on pre-existing traits, leading to biased results.

Population

🔵🔵⚪️⚪️🔵⚪️⚪️🔵

🔵 = Motivated, ⚪️ = Not Motivated

Treatment Group

🔵🔵🔵🔵

Mostly motivated people self-select.

Control Group

⚪️⚪️⚪️⚪️

Groups are not comparable.

5. The Rules: Core Assumptions for Valid Inference

SUTVA

No interference between units, and the treatment is consistent for all.

Ignorability

Treatment assignment is independent of potential outcomes (true by design in an RCT).

Positivity

For any group, there's a non-zero chance of receiving or not receiving the treatment.

Excludability

(For IVs) An instrument affects the outcome ONLY through the treatment.