Beyond the Average

Understanding Heterogeneous Treatment Effects in Experimental Research

From One Answer to Many

Instead of a simple "Does it work?" (the ATE), experiments often explore the average impact. This can obscure crucial details. Analyzing Heterogeneous Treatment Effects (HTE) provides a richer view, probing: "For whom and why does it work?" This segment illustrates the move from one average result to several, conditional effects.

Average Treatment Effect (ATE)

A solitary average, reflecting population-wide impact. Potent yet potentially deceptive.

+5% Average Effect

Conditional ATE (CATE)

Analyzes subgroup effects, identifying those who gain the most or potentially experience negative outcomes.

Group A +15%
Group B +2%
Group C -3%

Case Study: The Moving to Opportunity Experiment

The MTO study highlights HTE's significance. This program provided housing vouchers to low-income families. While early results showed no overall impact on adult earnings, a subsequent Heterogeneous Treatment Effects (HTE) analysis, detailed below, uncovered crucial differences based on the age of children at the time of relocation.

Long-Term Income Gains: A Tale of Two Childhoods

The data reveals a stark contrast: childhood movers (under 13) experienced a significant 31% income boost, whereas teenage movers saw a slight detriment. This overlooked result, missed by prior analyses, fundamentally reshaped policy conclusions and highlighted the program's unexpected benefits.

The HTE Methodological Toolkit

Here are a few rewritten options, aiming for a similar length and clarity: **Option 1 (Concise):** > HTE discovery methods have shifted. Historically, it was theory-driven. Now, machine learning unveils hidden patterns. Compare these two influential techniques below. **Option 2 (Slightly more detail):** > How we find HTE has changed. Early approaches confirmed pre-set ideas. Today, machine learning explores data for novel insights. See how these methods differ below. **Option 3 (Emphasis on power):** > HTE's discovery process has transformed. The past relied on set theories; the present uses machine learning. Compare these potent strategies below.

Goal: Test a Theory

This method begins with a defined hypothesis about a subgroup's unique impact. It's the preferred method for theory testing, often employing interaction terms in regression. To be valid, this hypothesis must be established *before* data analysis.

Y = β₀ + β₁T + β₂S + β₃(T x S) + ε

Here, a significant β₃ coefficient provides evidence of HTE for subgroup S.

Perils and Best Practices

Strong analytical ability demands careful execution. Exploratory analyses, in particular, need rigorous methods to prevent errors, false findings, and damage to research integrity.

⚠️The Peril of "P-Hacking"

Repeated subgroup testing vastly inflates the odds of a spurious, "significant" finding. This, known as p-hacking, severely undermines scientific reliability.

With 10 tests at 5% significance, the chance of a false positive is roughly 40%.

Best Practice: Pre-Analysis Plans

A robust Pre-Analysis Plan (PAP) is key to thwarting p-hacking. This publicly registered document, drafted prior to analysis, distinguishes planned, confirmatory tests from later, exploratory discoveries.

  • Hypotheses: Clearly state which subgroups are being tested and why.
  • Model: Specify the exact statistical model and control variables.
  • Multiple Testing: Define the correction procedure (e.g., FDR) to be used.
  • Power: Report power calculations for detecting a meaningful effect difference.