krit.club logo

Statistics and Probability - Data presentation (histograms, cumulative frequency, box plots)

Grade 10IB

Review the key concepts, formulae, and examples before starting your quiz.

🔑Concepts

Histograms with Unequal Class Widths: Unlike bar charts, the area of each bar in a histogram represents the frequency. For continuous data with unequal class widths, we plot Frequency Density on the vertical y-axis. The bars are drawn touching each other to show the continuous nature of the data, and the height of each bar is determined by the frequency density.

Cumulative Frequency (CF): A cumulative frequency table shows a running total of frequencies. To construct a CF curve (ogive), points are plotted using the upper class boundary for the x-coordinate and the cumulative frequency for the y-coordinate. Visually, this creates a smooth 'S-shaped' curve that starts at the lower boundary of the first class interval with a frequency of 0.

Quartiles and the Median: The median (Q2Q_2) is the middle value, found at the 50%50\% mark of the total frequency. The Lower Quartile (Q1Q_1) is the 25%25\% mark, and the Upper Quartile (Q3Q_3) is the 75%75\% mark. On a cumulative frequency graph, these are found by drawing a horizontal line from the calculated y-value (e.g., 0.25n0.25n) to the curve, and then a vertical line down to the x-axis.

The Interquartile Range (IQR): The IQRIQR is a measure of statistical dispersion, representing the range of the middle 50%50\% of the data. It is calculated as the difference between the upper and lower quartiles. Unlike the range, it is not affected by extreme outliers, making it a more robust measure of spread.

Box-and-Whisker Plots: This visual tool summarizes the 'five-number summary': Minimum, Q1Q_1, Median, Q3Q_3, and Maximum. It consists of a central box spanning from Q1Q_1 to Q3Q_3, a vertical line through the box at the Median, and horizontal 'whiskers' extending to the minimum and maximum values. It clearly shows the symmetry or skewness of the distribution.

Identifying Outliers: Outliers are extreme values that fall significantly outside the rest of the data. In IB Mathematics, an outlier is formally defined as any value smaller than Q11.5×IQRQ_1 - 1.5 \times IQR or larger than Q3+1.5×IQRQ_3 + 1.5 \times IQR. On a box plot, these are often plotted as individual points or crosses beyond the whiskers.

Interpreting Distribution Shape: A distribution is 'Positively Skewed' if the 'tail' or whiskers extend further to the right (higher values), and the median is closer to the left of the box. It is 'Negatively Skewed' if the tail extends further to the left (lower values) and the median is closer to the right of the box.

📐Formulae

Frequency Density=FrequencyClass WidthFrequency\ Density = \frac{Frequency}{Class\ Width}

IQR=Q3Q1IQR = Q_3 - Q_1

Lower Outlier Boundary=Q11.5×IQRLower\ Outlier\ Boundary = Q_1 - 1.5 \times IQR

Upper Outlier Boundary=Q3+1.5×IQRUpper\ Outlier\ Boundary = Q_3 + 1.5 \times IQR

Position of Q1=14nPosition\ of\ Q_1 = \frac{1}{4}n

Position of Median (Q2)=12nPosition\ of\ Median\ (Q_2) = \frac{1}{2}n

Position of Q3=34nPosition\ of\ Q_3 = \frac{3}{4}n

💡Examples

Problem 1:

A frequency table shows the heights of 80 plants. The class interval 10<h2010 < h \le 20 has a frequency of 15, and the class interval 20<h4020 < h \le 40 has a frequency of 18. Calculate the frequency density for both classes.

Solution:

  1. For the first class (10<h2010 < h \le 20):

    • Class Width =2010=10= 20 - 10 = 10
    • Frequency Density =1510=1.5= \frac{15}{10} = 1.5
  2. For the second class (20<h4020 < h \le 40):

    • Class Width =4020=20= 40 - 20 = 20
    • Frequency Density =1820=0.9= \frac{18}{20} = 0.9

Explanation:

To compare classes of different widths on a histogram, we must normalize the frequency by the width of the interval. This ensures that the area of the bar (height ×\times width) equals the frequency.

Problem 2:

Given a data set where Q1=45Q_1 = 45, Q3=65Q_3 = 65, the minimum value is 10, and the maximum value is 100. Determine if the minimum and maximum values are outliers.

Solution:

  1. Calculate the IQRIQR: IQR=Q3Q1=6545=20IQR = Q_3 - Q_1 = 65 - 45 = 20

  2. Determine the Lower Boundary for outliers: Lower=Q11.5×IQR=451.5(20)=4530=15Lower = Q_1 - 1.5 \times IQR = 45 - 1.5(20) = 45 - 30 = 15 Since the minimum value 10<1510 < 15, the value 10 is an outlier.

  3. Determine the Upper Boundary for outliers: Upper=Q3+1.5×IQR=65+1.5(20)=65+30=95Upper = Q_3 + 1.5 \times IQR = 65 + 1.5(20) = 65 + 30 = 95 Since the maximum value 100>95100 > 95, the value 100 is also an outlier.

Explanation:

We use the 1.5×IQR1.5 \times IQR rule to define the boundaries of 'normal' data. Any point outside the range [15,95][15, 95] is classified as an outlier.