I. The Foundations
Causal inference is the science of moving beyond "what is related?" to "what causes what?". This section explores the core concepts that separate correlation from causation and the fundamental challenges we must overcome to make causal claims.
Correlation is Not Causation
The most common trap in data analysis is mistaking a spurious correlation for a causal link. Often, a hidden "third variable," or **confounder**, is the true cause of both the variables we are observing.
For example, data consistently shows a strong correlation between ice cream sales and crime rates. Does ice cream cause crime? Click below to reveal the confounder.
Of course not. Hot weather (the confounder) causes both an increase in ice cream sales and more people to be outside, leading to more crime.
The Fundamental Problem of Causal Inference
The core challenge is that we can never observe what would have happened to the same person under both treatment and control at the same time. The alternative outcome, the **counterfactual**, is forever hidden. Hover over the individual below to see their path.
Path 1: Receives Treatment
Outcome A
Path 2: No Treatment
Outcome B
II. Causal Frameworks
To reason about causality, we need a formal language. Structural Causal Models (SCMs) use graphs to map our assumptions about the world. Build your own Directed Acyclic Graph (DAG) to understand how they work.
Interactive DAG Builder
Click on the canvas to add variables (nodes). Click and drag from one node to another to create a causal path (edge). The tool will identify key paths and suggest which variables to control for to estimate the causal effect of 'T' (Treatment) on 'Y' (Outcome), if possible.
Instructions
- **Add Node:** Click empty space
- **Add Edge:** Drag from node to node
- **Name Node:** Double-click a node
- **Reset:** Use button below
Analysis Results
Add nodes named 'T' and 'Y' to begin.
III. Experimental Methods
Randomized Controlled Trials (RCTs) are the gold standard because randomization creates comparable groups. Explore how these designs work to isolate causal effects.
Randomization Simulator
Below is a population of 20 individuals with a baseline covariate (e.g., age). Click "Randomize" to assign them to Treatment or Control and see how well the groups are balanced on the covariate. While any single randomization might be imbalanced by chance, randomization works on average.
Treatment Group (0)
Avg. Age: N/A
Control Group (0)
Avg. Age: N/A
Standardized Difference: N/A
(A value < 0.1 is considered well-balanced)
IV. Quasi-Experimental Methods
When RCTs aren't possible, we use quasi-experimental methods to analyze observational data. Each method relies on a clever design and a key assumption to mimic an experiment.
Regression Discontinuity (RDD) Simulator
RDD is used when a treatment is assigned by a cutoff score. We assume people just above and below the cutoff are similar, so any jump in the outcome is the treatment effect. Adjust the bandwidth to see how using more or less data around the cutoff changes the estimated effect.
Estimated Treatment Effect
0
Difference-in-Differences (DiD) Simulator
DiD compares the change in an outcome over time between a treated group and an untreated group. Its validity rests on the **Parallel Trends Assumption**: that the groups would have followed the same trend without the treatment. Toggle the assumption to see its impact.
DiD Estimated Effect
0