The Goal
Our goal is to know an entire **group**, though we examine a tiny **subset**. Statistical inference uses this subset's data to infer and estimate properties of the group.
Population
(Everyone)
Sample
(A Small Group)
The Bridge: Central Limit Theorem
By averaging numerous random samples and graphing the results, we get a predictable bell-shaped curve: a **sampling distribution**. This enables inference using normal distribution characteristics.
The Framework: Hypothesis Testing
This is the formal process for testing a claim.
1️⃣
State Hypotheses
State the null (H₀, no change) and alternative (Hₐ, a change) hypotheses.
2️⃣
Set the Standard
Choose a significance level (α), usually 5% (0.05).
3️⃣
Analyze Data
Calculate a test statistic from your sample data.
4️⃣
Make a Decision
Compare your result (p-value) to your standard (α).
The Verdict
The **p-value** quantifies the likelihood of observing your data, assuming the null hypothesis is correct. It's compared against alpha (α) for statistical conclusions.
IF p-value ≤ α
💥
Reject the Null Hypothesis
(The result is statistically significant)
IF p-value > α
🤷
Fail to Reject the Null
(The result is not statistically significant)
The Risks: Errors
Because of probability's nature, errors can occur in two ways.
| Actual Reality | ||
|---|---|---|
| H₀ is True | H₀ is False | |
| Our Decision | Type I Error False Positive | Correct! True Positive |
| Reject H₀ | Correct! True Negative | Type II Error False Negative |
The Uncertainty
Here are a few options, all similar in length: * A **confidence interval** provides a plausible value range for a population parameter, measuring uncertainty around the sample estimate. * A **confidence interval** specifies a range of likely values for the population parameter, reflecting uncertainty in our sample estimate. * With a **confidence interval**, you get a range of potential values for the population parameter, showing the uncertainty around the sample result.
Sample Mean: 105
95% Confidence Interval: [99, 111]
Here are a few options, all similar in length and meaning: * We're 95% sure the population mean is between 99 and 111. * The true population mean is 99-111, with 95% confidence. * With 95% confidence, the population mean falls between 99 and 111. * We're 95% certain the mean is in the 99 to 111 range.