Statistics and Probability - Measures of dispersion: range, interquartile range, and box-and-whisker plots
Review the key concepts, formulae, and examples before starting your quiz.
🔑Concepts
Range: The simplest measure of dispersion, calculated as the difference between the maximum and minimum values in a dataset. Visually, it represents the total horizontal distance covered by the data points on a number line, from the lowest dot to the highest.
Quartiles: These are values that divide an ordered dataset into four equal parts. The First Quartile () marks the percentile, the Median () marks the percentile, and the Third Quartile () marks the percentile.
Interquartile Range (IQR): The difference between the third and first quartiles (). It represents the spread of the middle of the data. Because it ignores the top and bottom , it is a more 'robust' measure than the range, meaning it is not heavily influenced by extreme outliers.
Five-Number Summary: A descriptive set of five values used to summarize the distribution of data: Minimum, , Median, , and Maximum. This summary provides a snapshot of both the central tendency and the spread of the data.
Box-and-Whisker Plot (Visual Representation): A diagram constructed using the five-number summary. It features a central 'box' extending from to , with a vertical line drawn at the Median (). Two horizontal lines, called 'whiskers', extend from the box to the Minimum and Maximum values. The length of the box represents the IQR, and the total length of the plot represents the Range.
Outliers and Boundaries: An outlier is an extreme value that lies significantly far from the rest of the data. Mathematically, any value smaller than or larger than is considered an outlier. On a box plot, these are often marked as individual points (dots or crosses) beyond the whiskers.
Data Distribution and Skewness: The shape of the box-and-whisker plot indicates how data is distributed. If the median line is closer to the left side () of the box, the data may be positively skewed. If the whiskers are of significantly different lengths, it indicates that the spread of the extreme values is higher on one side than the other.
📐Formulae
💡Examples
Problem 1:
The marks obtained by 9 students in a quiz are: . Calculate the Range and the Interquartile Range (IQR).
Solution:
- Order the data from least to greatest:
- Find the Range:
- Find the Median (): The middle value of 9 numbers is the position:
- Find : The median of the lower half () is
- Find : The median of the upper half () is
- Calculate :
Explanation:
To find measures of dispersion, the data must first be ordered. The range gives the total spread (), while the IQR () focuses on the spread of the middle half of the marks, which is less affected by the highest and lowest scores.
Problem 2:
A dataset has , , and . Determine if a value of would be considered an outlier in this dataset.
Solution:
- Calculate the IQR:
- Calculate the multiplier for outliers:
- Calculate the Upper Bound:
- Compare the value to the bound:
Explanation:
Since the value is greater than the upper boundary of , it is statistically classified as an outlier. In a box-and-whisker plot, the whisker would stop at the last data point within the limit, and would be plotted as a separate dot.