The Path of Statistical Inference

From a Small Sample to a Big Conclusion

🎯

The Goal

Our goal is to know an entire **group**, though we examine a tiny **subset**. Statistical inference uses this subset's data to infer and estimate properties of the group.

👥

Population

(Everyone)

🧑‍🤝‍🧑

Sample

(A Small Group)

🌉

The Bridge: Central Limit Theorem

By averaging numerous random samples and graphing the results, we get a predictable bell-shaped curve: a **sampling distribution**. This enables inference using normal distribution characteristics.

📜

The Framework: Hypothesis Testing

This is the formal process for testing a claim.

1️⃣

State Hypotheses

State the null (H₀, no change) and alternative (Hₐ, a change) hypotheses.

2️⃣

Set the Standard

Choose a significance level (α), usually 5% (0.05).

3️⃣

Analyze Data

Calculate a test statistic from your sample data.

4️⃣

Make a Decision

Compare your result (p-value) to your standard (α).

⚖️

The Verdict

The **p-value** quantifies the likelihood of observing your data, assuming the null hypothesis is correct. It's compared against alpha (α) for statistical conclusions.

IF p-value ≤ α

💥

Reject the Null Hypothesis

(The result is statistically significant)

IF p-value > α

🤷

Fail to Reject the Null

(The result is not statistically significant)

⚠️

The Risks: Errors

Because of probability's nature, errors can occur in two ways.

Actual Reality
H₀ is True H₀ is False
Our Decision Type I Error False Positive Correct! True Positive
Reject H₀ Correct! True Negative Type II Error False Negative
📏

The Uncertainty

Here are a few options, all similar in length: * A **confidence interval** provides a plausible value range for a population parameter, measuring uncertainty around the sample estimate. * A **confidence interval** specifies a range of likely values for the population parameter, reflecting uncertainty in our sample estimate. * With a **confidence interval**, you get a range of potential values for the population parameter, showing the uncertainty around the sample result.

Sample Mean: 105

99 111

95% Confidence Interval: [99, 111]

Here are a few options, all similar in length and meaning: * We're 95% sure the population mean is between 99 and 111. * The true population mean is 99-111, with 95% confidence. * With 95% confidence, the population mean falls between 99 and 111. * We're 95% certain the mean is in the 99 to 111 range.