Why is "Why?" a Hard Question?

This experience translates the core theories of causal inference into interactive visualizations. You'll move beyond observing that two things are related and learn about the rigorous frameworks used to determine if one *causes* the other.


The Fundamental Problem of Causal Inference

We can never observe what would have happened to the same person under a different choice at the same time. The unobserved outcome is the **counterfactual**. Hover over the card to see the problem in action.

🧑‍⚕️

Observed Reality

Alex took the new medication. We observe their outcome: Blood Pressure = 120.

Hover to see the counterfactual

🤔

The Counterfactual

What if Alex had *not* taken the medication? We can never know for sure. This potential outcome is unobservable.

Potential Outcome = ?

Since we can't measure the individual causal effect (Observed - Counterfactual), we must estimate the **Average Treatment Effect (ATE)** across a group.


The Solution: Randomization

**Random assignment** is the gold standard for unbiased inference. It creates two groups that are, on average, identical in every way *except* for the treatment. This allows us to use the control group's average outcome as a valid counterfactual for the treatment group's average outcome. Let's see how it works, and contrast it with its main pitfall: selection bias.

Experiment Simulation

Our population has two key characteristics: motivation (🧠 High / 😴 Low) and baseline health (💚 Good / 💛 Fair). Our goal is to measure the effect of a 'Wellness Program'.

Treatment Group

Control Group

Population Area

The Rules of the Game: Core Assumptions

For any causal claim to be valid, several assumptions must hold. These are not just statistical checks; they are fundamental statements about how the world works in the context of your study.

This is a compound assumption with two key parts:

  1. No Interference: The outcome for one person is not affected by who else gets the treatment.
    Example Violation: In a vaccine trial, if my friend getting vaccinated reduces my chance of getting sick, their treatment has interfered with my outcome.
  2. Consistency: The treatment is the same for everyone who receives it. There are no hidden variations.
    Example Violation: A "tutoring" program where some students get 1 hour of help and others get 10 hours. The treatment isn't consistent.

This assumes that treatment assignment is not based on the potential outcomes, even after controlling for observed covariates (X). In an RCT, this is true by design. In an observational study, it's the critical, untestable assumption that you have measured all important confounding variables. It means your treatment and control groups are "exchangeable" once you've accounted for X.

For any given set of characteristics, there must be a chance of being in either the treatment or the control group.
Example Violation: If the wellness program is only offered to employees over 40, you can't estimate its effect for employees under 40, because there are no untreated people to compare them to in that age group.