krit.club logo

Statistics and Probability - Data Presentation (Histograms, Box Plots, Cumulative Frequency)

Grade 11IB

Review the key concepts, formulae, and examples before starting your quiz.

🔑Concepts

Histograms: These are used for continuous data where the area of each bar represents the frequency. If class widths are unequal, we must plot Frequency Density on the y-axis instead of frequency. Visually, a histogram with unequal class widths will have bars of different widths, and the height is adjusted so that Area=FrequencyArea = Frequency. The calculation used is FrequencyDensity=FrequencyClassWidthFrequency Density = \frac{Frequency}{Class Width}.

Cumulative Frequency Curves (Ogives): This graph represents the running total of frequencies. We plot the cumulative frequency on the y-axis against the upper class boundary of each interval on the x-axis. Visually, this results in a smooth S-shaped curve. It is used to estimate the median (50%50\% of the data), quartiles (25%25\% and 75%75\%), and percentiles by reading across from the y-axis to the curve and then down to the x-axis.

Box-and-Whisker Plots: A visual summary of the five-number summary: Minimum, Lower Quartile (Q1Q_{1}), Median (Q2Q_{2}), Upper Quartile (Q3Q_{3}), and Maximum. The 'box' spans from Q1Q_{1} to Q3Q_{3}, representing the Interquartile Range (IQRIQR), with a vertical line inside marking the median. 'Whiskers' extend to the minimum and maximum values that are not outliers.

Outliers and the 1.5×IQR1.5 \times IQR Rule: Outliers are extreme values that fall significantly outside the main body of data. Mathematically, a value is an outlier if it is less than Q11.5×IQRQ_{1} - 1.5 \times IQR or greater than Q3+1.5×IQRQ_{3} + 1.5 \times IQR. Visually, outliers are represented on a box plot as individual points (dots or crosses) beyond the whiskers.

Data Skewness: Skewness refers to the symmetry of the data distribution. In a box plot, if the median line is closer to Q1Q_{1}, the data is 'positively skewed' (tail to the right). If the median is closer to Q3Q_{3}, it is 'negatively skewed' (tail to the left). Visually, on a histogram, positive skew shows a 'hump' on the left and a long tail stretching to the right.

Frequency Polygons: A frequency polygon is a line graph used to represent the shape of a frequency distribution. It is created by joining the midpoints of the tops of the bars in a histogram with straight lines. Visually, it provides a simplified view of the data's peak and spread, allowing for easy comparison between multiple datasets on the same axes.

📐Formulae

Frequency Density=FrequencyClass Width\text{Frequency Density} = \frac{\text{Frequency}}{\text{Class Width}}

Interquartile Range (IQR)=Q3Q1\text{Interquartile Range (IQR)} = Q_{3} - Q_{1}

Lower Boundary for Outliers=Q11.5×(Q3Q1)\text{Lower Boundary for Outliers} = Q_{1} - 1.5 \times (Q_{3} - Q_{1})

Upper Boundary for Outliers=Q3+1.5×(Q3Q1)\text{Upper Boundary for Outliers} = Q_{3} + 1.5 \times (Q_{3} - Q_{1})

Estimate of Mean from Grouped Data=f×xf where x is the midpoint\text{Estimate of Mean from Grouped Data} = \frac{\sum f \times x}{\sum f} \text{ where } x \text{ is the midpoint}

Midpoint=Lower Bound+Upper Bound2\text{Midpoint} = \frac{\text{Lower Bound} + \text{Upper Bound}}{2}

💡Examples

Problem 1:

A dataset has a Lower Quartile (Q1Q_{1}) of 1212 and an Upper Quartile (Q3Q_{3}) of 2020. Determine if a value of 3535 is considered an outlier.

Solution:

  1. Calculate the Interquartile Range (IQRIQR): IQR=Q3Q1=2012=8IQR = Q_{3} - Q_{1} = 20 - 12 = 8 2. Determine the Upper Boundary for outliers: Upper Boundary=Q3+1.5×IQR\text{Upper Boundary} = Q_{3} + 1.5 \times IQR Upper Boundary=20+1.5×8=20+12=32\text{Upper Boundary} = 20 + 1.5 \times 8 = 20 + 12 = 32 3. Compare the value to the boundary: Since 35>3235 > 32, the value 3535 is an outlier.

Explanation:

To identify outliers, we first find the spread of the middle 50%50\% of the data (IQRIQR) and then see if the specific value lies more than 1.51.5 times that spread above the third quartile.

Problem 2:

The following table shows the frequency of test scores. Calculate the frequency density for the class interval 40x<6040 \leq x < 60 if the frequency is 3030.

Solution:

  1. Identify the class width for the interval 40x<6040 \leq x < 60: Class Width=6040=20\text{Class Width} = 60 - 40 = 20 2. Use the frequency density formula: Frequency Density=FrequencyClass Width\text{Frequency Density} = \frac{\text{Frequency}}{\text{Class Width}} Frequency Density=3020=1.5\text{Frequency Density} = \frac{30}{20} = 1.5

Explanation:

In a histogram with unequal class widths, the height of the bar (Frequency Density) is found by dividing the frequency by the interval's width. This ensures the area of the bar accurately represents the frequency.