Statistics
Table of Contents
Statistics: Measuring the Spread#
In Grade 10, you measured where the data centres (mean, median, mode). In Grade 11, you measure how spread out the data is. Two data sets can have the same mean but look completely different — the spread tells the full story.
The Big Idea: Consistency#
Imagine two cricket batsmen with the same average of 50 runs:
- Batsman A: Scores 48, 52, 49, 51 → very consistent
- Batsman B: Scores 0, 100, 10, 90 → wildly inconsistent
Both average 50, but you’d rather have Batsman A on your team. The standard deviation ($\sigma$) measures this consistency:
- Low $\sigma$ → data is clustered tightly around the mean
- High $\sigma$ → data is spread far from the mean
Standard Deviation & Variance#
The Formula#
$$\sigma = \sqrt{\frac{\sum(x_i - \bar{x})^2}{n}}$$| Symbol | Meaning |
|---|---|
| $x_i$ | Each data value |
| $\bar{x}$ | The mean of all values |
| $n$ | How many values |
| $\sigma$ | Standard deviation |
| $\sigma^2$ | Variance (the square of $\sigma$) |
The Step-by-Step Method#
- Calculate the mean ($\bar{x}$)
- Find each deviation: $x_i - \bar{x}$
- Square each deviation: $(x_i - \bar{x})^2$
- Find the mean of the squared deviations (= variance)
- Square root the variance (= standard deviation)
💡 Calculator shortcut: In STAT mode, enter all data → 1-VAR stats → read $\bar{x}$ and $\sigma_x$ directly. Use $\sigma_x$ (population), NOT $s_x$ (sample).
Grouped Data: Histograms & Ogives#
Histograms#
- Bars represent frequency of each class interval
- Bars touch (no gaps) because the data is continuous
- The modal class is the tallest bar
Frequency Polygons#
- Connect the midpoints of the tops of the histogram bars
- Extend to the x-axis one interval before and after the data
Ogives (Cumulative Frequency Curves)#
- Plot cumulative frequency against the upper boundary of each class
- The S-shaped curve lets you read off the median and quartiles
- Median = value at $\frac{n}{2}$ on the cumulative frequency axis
Symmetric vs Skewed Data#
| Distribution | Shape | Mean vs Median |
|---|---|---|
| Symmetric | Bell-shaped, balanced | Mean $\approx$ Median |
| Positively skewed | Tail extends RIGHT | Mean $>$ Median |
| Negatively skewed | Tail extends LEFT | Mean $<$ Median |
💡 Memory trick: The mean is “pulled” towards the tail. If the tail is on the right (positive direction), the mean is greater than the median.
Outliers#
An outlier is a data value that is unusually far from the rest. The standard rule:
$$\text{Outlier if } x < Q_1 - 1.5 \times IQR \text{ or } x > Q_3 + 1.5 \times IQR$$Outliers drag the mean and standard deviation. The median and IQR are resistant to outliers.
Deep Dives#
- Standard Deviation, Variance & Data Analysis — full worked example by hand, skewness interpretation, outlier detection, and calculator tips
🚨 Common Mistakes#
- Using $s_x$ instead of $\sigma_x$: On your calculator, use $\sigma_x$ (population) for school maths, not $s_x$ (sample).
- Forgetting frequency: If data is in a frequency table, multiply each $(x - \bar{x})^2$ by its frequency before summing.
- Ogive plotting: Plot against the upper boundary of each class, NOT the midpoint.
- Histogram gaps: Histograms for continuous data have no gaps between bars. Bar graphs (for discrete/categorical data) have gaps.
- Confusing symmetric and skewed: Look at where the tail is, not where the peak is.
🔗 Related Grade 11 topics:
- Probability: Combined Events — contingency tables bridge statistics and probability
📌 Grade 10 foundation: Five-Number Summary
📌 Grade 12 extension: Statistics & Regression — scatter plots, regression lines, and correlation
