Statistics and Probability - Data presentation (histograms, cumulative frequency, box plots)
Review the key concepts, formulae, and examples before starting your quiz.
🔑Concepts
Histograms with Unequal Class Widths: Unlike bar charts, the area of each bar in a histogram represents the frequency. For continuous data with unequal class widths, we plot Frequency Density on the vertical y-axis. The bars are drawn touching each other to show the continuous nature of the data, and the height of each bar is determined by the frequency density.
Cumulative Frequency (CF): A cumulative frequency table shows a running total of frequencies. To construct a CF curve (ogive), points are plotted using the upper class boundary for the x-coordinate and the cumulative frequency for the y-coordinate. Visually, this creates a smooth 'S-shaped' curve that starts at the lower boundary of the first class interval with a frequency of 0.
Quartiles and the Median: The median () is the middle value, found at the mark of the total frequency. The Lower Quartile () is the mark, and the Upper Quartile () is the mark. On a cumulative frequency graph, these are found by drawing a horizontal line from the calculated y-value (e.g., ) to the curve, and then a vertical line down to the x-axis.
The Interquartile Range (IQR): The is a measure of statistical dispersion, representing the range of the middle of the data. It is calculated as the difference between the upper and lower quartiles. Unlike the range, it is not affected by extreme outliers, making it a more robust measure of spread.
Box-and-Whisker Plots: This visual tool summarizes the 'five-number summary': Minimum, , Median, , and Maximum. It consists of a central box spanning from to , a vertical line through the box at the Median, and horizontal 'whiskers' extending to the minimum and maximum values. It clearly shows the symmetry or skewness of the distribution.
Identifying Outliers: Outliers are extreme values that fall significantly outside the rest of the data. In IB Mathematics, an outlier is formally defined as any value smaller than or larger than . On a box plot, these are often plotted as individual points or crosses beyond the whiskers.
Interpreting Distribution Shape: A distribution is 'Positively Skewed' if the 'tail' or whiskers extend further to the right (higher values), and the median is closer to the left of the box. It is 'Negatively Skewed' if the tail extends further to the left (lower values) and the median is closer to the right of the box.
📐Formulae
💡Examples
Problem 1:
A frequency table shows the heights of 80 plants. The class interval has a frequency of 15, and the class interval has a frequency of 18. Calculate the frequency density for both classes.
Solution:
-
For the first class ():
- Class Width
- Frequency Density
-
For the second class ():
- Class Width
- Frequency Density
Explanation:
To compare classes of different widths on a histogram, we must normalize the frequency by the width of the interval. This ensures that the area of the bar (height width) equals the frequency.
Problem 2:
Given a data set where , , the minimum value is 10, and the maximum value is 100. Determine if the minimum and maximum values are outliers.
Solution:
-
Calculate the :
-
Determine the Lower Boundary for outliers: Since the minimum value , the value 10 is an outlier.
-
Determine the Upper Boundary for outliers: Since the maximum value , the value 100 is also an outlier.
Explanation:
We use the rule to define the boundaries of 'normal' data. Any point outside the range is classified as an outlier.