Why is "Why?" a Hard Question?

This interactive guide visualizes causal inference, moving past mere correlation to explore frameworks for identifying *causal* relationships.


The Fundamental Problem of Causal Inference

The outcome of a different choice, unobserved simultaneously, is the **counterfactual**. See the issue by hovering over the card.

🧑‍⚕️

Observed Reality

Alex took the new medication. We observe their outcome: Blood Pressure = 120.

Hover to see the counterfactual

🤔

The Counterfactual

Here are a few options, aiming for similar length and meaning: * **If Alex skipped the meds, what then?** That remains unseeable. * **Without the medicine, what transpired?** This state's unrevealed. * **Had Alex refused the pills?** The result's hypothetical.

Potential Outcome = ?

Because we can't isolate the individual causal impact, we're forced to estimate the **average effect of treatment (ATE)** for a cohort.


The Solution: Randomization

**Random allocation** is the benchmark for impartial conclusions. It yields comparable groups, differing mainly due to the intervention. This lets us use the control group's average result as a sound comparison for the treated group. Let's explore this, highlighting its key challenge: selection bias.

Experiment Simulation

Here's a rewrite of similar length: We assess two population traits: drive (High 🧠 / Low 😴) and health status (Good 💚 / Fair 💛). We aim to gauge the impact of a 'Wellness Program'.

Treatment Group

Control Group

Population Area

The Rules of the Game: Core Assumptions

Here are a few options, aiming for a similar size and meaning: * Valid causal claims hinge on several key assumptions. Beyond statistical tests, these reflect basic truths about your study's world. * Causal inferences require adherence to crucial assumptions. These are not merely statistical, but foundational principles about the system you're analyzing. * Validating a causal claim necessitates specific assumptions. These go beyond stats, representing core beliefs about the study's underlying mechanisms.

This is a compound assumption with two key parts:

  1. No Interference: Here are a few options, all similar in length and meaning: * Individual results are independent of others' treatment. * Each person's result is unaffected by others' outcomes. * Treatment success is independent across individuals. * One's response is separate from others' responses.
    Example Violation: If a vaccine protects my friend, thus lowering my illness risk, their treatment impacted my health outcome.
  2. Consistency: All patients receive identical treatment; no undisclosed differences exist.
    Example Violation: A tutoring program offering inconsistent support, with some students receiving 1 hour and others 10 hours of assistance.

This hinges on the critical assumption of no unobserved confounding: treatment assignment is independent of potential outcomes, conditional on observed covariates (X). This holds by construction in randomized controlled trials. In observational studies, it requires measuring and adjusting for all relevant confounders, ensuring treatment and control groups become 'exchangeable' with respect to X.

Here are a few options, all similar in length: * Regardless of attributes, assignment to treatment or control is possible. * Every individual, with any traits, can be in either group. * Each person, with their specific features, might be treated or controlled. * For any profile, there's a treatment or control group option.
Example Violation: Due to the program's age restriction (40+), its impact on younger employees (<40) is unmeasurable because a comparable, untreated younger group doesn't exist.