The Five-Number Summary & Box-and-Whisker Plots#
The five-number summary is the backbone of Grade 10 statistics. It condenses an entire data set into 5 key values, which you then use to draw a box-and-whisker plot — the most important graph in this section.
Step 1: Sort the Data#
Always sort from smallest to largest before doing anything. This is the most common mistake in statistics — students skip sorting and get wrong quartiles.
Step 2: Find the Five Numbers#
| Value | What it is | How to find it |
|---|---|---|
| Minimum | Smallest value | First number after sorting |
| $Q_1$ (Lower Quartile) | 25th percentile | Median of the bottom half |
| $Q_2$ (Median) | 50th percentile | Middle value of the full data set |
| $Q_3$ (Upper Quartile) | 75th percentile | Median of the top half |
| Maximum | Largest value | Last number after sorting |
Finding the Median ($Q_2$)#
- If $n$ is odd: median = middle value at position $\frac{n+1}{2}$
- If $n$ is even: median = average of the two middle values
Finding $Q_1$ and $Q_3$#
Split the data into two halves at the median. If $n$ is odd, exclude the median from both halves. Then find the median of each half.
Worked Example#
Data (already sorted): $3;\; 5;\; 7;\; 8;\; 10;\; 12;\; 14;\; 16;\; 18$
$n = 9$ (odd)
Median ($Q_2$): position $\frac{9+1}{2} = 5$th value = 10
Bottom half (exclude median): $3;\; 5;\; 7;\; 8$ $Q_1 = \frac{5 + 7}{2} = 6$
Top half (exclude median): $12;\; 14;\; 16;\; 18$ $Q_3 = \frac{14 + 16}{2} = 15$
| Min | $Q_1$ | $Q_2$ | $Q_3$ | Max |
|---|---|---|---|---|
| 3 | 6 | 10 | 15 | 18 |
Measures of Spread#
| Measure | Formula | What it tells you |
|---|---|---|
| Range | Max $-$ Min = $18 - 3 = 15$ | Total spread |
| IQR | $Q_3 - Q_1 = 15 - 6 = 9$ | Spread of the middle 50% |
💡 The IQR is more reliable than the range because it ignores extreme values (outliers). Exam questions often ask “which is the better measure of spread?” — the answer is usually IQR.
Drawing a Box-and-Whisker Plot#
- Draw a number line to scale covering the full range
- Mark the 5 values on the number line
- Draw a box from $Q_1$ to $Q_3$
- Draw a vertical line inside the box at the median ($Q_2$)
- Draw whiskers (horizontal lines) from the box to the minimum and maximum
Reading a Box Plot#
| Feature | Interpretation |
|---|---|
| Median centred in box | Data is symmetric |
| Median closer to $Q_1$ | Positively skewed (tail to the right) |
| Median closer to $Q_3$ | Negatively skewed (tail to the left) |
| Short box, long whiskers | Data has extreme values but the middle 50% is consistent |
| Long box | The middle 50% of the data is very spread out |
Comparing Two Box Plots#
When asked to compare two data sets using box plots:
- Compare the medians — which group performed better overall?
- Compare the IQRs — which group was more consistent?
- Compare the ranges — which group had more extreme variation?
- Comment on skewness — are the distributions similar or different?
Grouped Data#
When data is given in class intervals (e.g., 40–50, 50–60, …):
- You cannot find the exact five-number summary
- Use the midpoint of each class to estimate the mean: midpoint $= \frac{\text{lower} + \text{upper}}{2}$
- Use an ogive (cumulative frequency curve) to estimate $Q_1$, $Q_2$, and $Q_3$
Estimated Mean from a Frequency Table#
$$\bar{x} = \frac{\sum f \times x_{\text{mid}}}{\sum f}$$where $f$ = frequency and $x_{\text{mid}}$ = midpoint of each class.
Drawing and Reading an Ogive (Cumulative Frequency Curve)#
An ogive plots cumulative frequency against the upper boundary of each class. It lets you estimate the median and quartiles for grouped data.
Worked Example: 50 students’ test scores:
| Class | Frequency | Cumulative Frequency | Upper Boundary |
|---|---|---|---|
| $20 \leq x < 30$ | $3$ | $3$ | $30$ |
| $30 \leq x < 40$ | $7$ | $10$ | $40$ |
| $40 \leq x < 50$ | $12$ | $22$ | $50$ |
| $50 \leq x < 60$ | $15$ | $37$ | $60$ |
| $60 \leq x < 70$ | $9$ | $46$ | $70$ |
| $70 \leq x < 80$ | $4$ | $50$ | $80$ |
How to draw: Plot each (upper boundary, cumulative frequency) point: $(30;\, 3)$, $(40;\, 10)$, $(50;\, 22)$, $(60;\, 37)$, $(70;\, 46)$, $(80;\, 50)$. Start the curve at $(20;\, 0)$. Connect with a smooth S-shaped curve.
How to read quartiles:
- Median ($Q_2$): $\frac{50}{2} = 25$th value → go across from $25$ on the $y$-axis to the curve, then down to the $x$-axis → ≈ 52
- $Q_1$: $\frac{50}{4} = 12.5$th value → read across from $12.5$ → ≈ 42
- $Q_3$: $\frac{3 \times 50}{4} = 37.5$th value → read across from $37.5$ → ≈ 61
⚠️ Common ogive errors: Always plot against the upper boundary, NOT the midpoint. Start the curve at the lower boundary of the first class with cumulative frequency = 0.
🚨 Common Mistakes#
- Not sorting data first: You MUST sort before finding the median and quartiles.
- Including the median in both halves: When $n$ is odd, the median itself is excluded from both the bottom and top halves when finding $Q_1$ and $Q_3$.
- Box plot not to scale: The number line must be drawn to scale — spacing must be proportional.
- Confusing range and IQR: Range = Max $-$ Min. IQR = $Q_3 - Q_1$. They measure different things.
- Grouped data: Don’t try to find exact quartiles from grouped data — use midpoints for the mean and an ogive for quartiles.
💡 Pro Tip#
If a question asks “which measure of central tendency best represents the data?”:
- Symmetric data → mean and median are similar, either works
- Skewed data or outliers → the median is better (it’s not pulled by extreme values)
🔗 Related Grade 10 topics:
- Probability — data analysis connects to probability
📌 Where this leads in Grade 11: Statistics: Standard Deviation & Variance — measuring spread numerically with $\sigma$
