krit.club logo

Statistics and Probability - Representation of data: frequency tables, histograms, and cumulative frequency graphs

Grade 9IB

Review the key concepts, formulae, and examples before starting your quiz.

🔑Concepts

Frequency Tables for Grouped Data: These tables organize continuous data into intervals or 'classes' (e.g., 10x<2010 \le x < 20). Visually, the table consists of a column for the class intervals and a column for the frequency (ff), which represents how many data points fall within each range.

Histograms: Unlike bar charts, histograms are used for continuous data and have no gaps between bars. The horizontal xx-axis represents the continuous scale (e.g., time, mass), and the area of each bar is proportional to the frequency. For bars of equal width, the height of the bar represents the frequency.

Frequency Density: When class widths are unequal in a histogram, the height of the bar is the frequency density rather than the frequency. Visually, this ensures that the area (width×heightwidth \times height) correctly represents the frequency, preventing wider classes from looking artificially 'larger' than narrow ones.

Cumulative Frequency: This is a 'running total' of frequencies. To calculate it, you add each class frequency to the sum of all previous frequencies. In a table, this results in a non-decreasing sequence of values ending at the total number of data points (nn).

Cumulative Frequency Graphs (Ogives): This graph is created by plotting the cumulative frequency on the yy-axis against the upper class boundary of each interval on the xx-axis. The points are connected with a smooth S-shaped curve or straight lines, starting from the lower boundary of the first class at zero frequency.

Median and Quartiles from Graphs: To find the median (Q2Q_2) visually, locate the n2\frac{n}{2} position on the yy-axis, move horizontally to the curve, and then vertically down to the xx-axis. Similarly, the Lower Quartile (Q1Q_1) is at n4\frac{n}{4} and the Upper Quartile (Q3Q_3) is at 3n4\frac{3n}{4}.

Interquartile Range (IQR) and Box Plots: The IQR is the difference between Q3Q_3 and Q1Q_1, representing the spread of the middle 50%50\% of the data. This data can be visually summarized in a Box Plot, where a central box spans from Q1Q_1 to Q3Q_3 with a line at the median, and 'whiskers' extend to the minimum and maximum values.

Estimated Mean from Grouped Data: Because exact values are unknown in grouped tables, the mean is estimated using the midpoint (xx) of each class. Visually, we assume all data points in a bar are concentrated at the center of that bar's interval.

📐Formulae

Estimated Mean (xˉ)=(fx)f\text{Estimated Mean } (\bar{x}) = \frac{\sum (f \cdot x)}{\sum f}

Frequency Density=FrequencyClass Width\text{Frequency Density} = \frac{\text{Frequency}}{\text{Class Width}}

Class Width=Upper BoundaryLower Boundary\text{Class Width} = \text{Upper Boundary} - \text{Lower Boundary}

Interquartile Range (IQR)=Q3Q1\text{Interquartile Range (IQR)} = Q_3 - Q_1

Lower Quartile Position (Q1)14n\text{Lower Quartile Position } (Q_1) \approx \frac{1}{4}n

Upper Quartile Position (Q3)34n\text{Upper Quartile Position } (Q_3) \approx \frac{3}{4}n

💡Examples

Problem 1:

Calculate the estimated mean for the following frequency table of test scores:

  • 0s<100 \le s < 10: Frequency 33
  • 10s<2010 \le s < 20: Frequency 88
  • 20s<3020 \le s < 30: Frequency 99

Solution:

  1. Find midpoints (xx) for each class:

    • Class 1: 0+102=5\frac{0+10}{2} = 5
    • Class 2: 10+202=15\frac{10+20}{2} = 15
    • Class 3: 20+302=25\frac{20+30}{2} = 25
  2. Calculate fxf \cdot x for each class:

    • 3×5=153 \times 5 = 15
    • 8×15=1208 \times 15 = 120
    • 9×25=2259 \times 25 = 225
  3. Sum the frequencies (f\sum f): 3+8+9=203 + 8 + 9 = 20

  4. Sum the fxf \cdot x values (fx\sum f \cdot x): 15+120+225=36015 + 120 + 225 = 360

  5. Calculate Mean: xˉ=36020=18\bar{x} = \frac{360}{20} = 18

Explanation:

To estimate the mean from grouped data, we assume every value in an interval is equal to the midpoint of that interval. We then multiply these midpoints by their respective frequencies and divide by the total number of observations.

Problem 2:

A cumulative frequency graph for 8080 students' heights starts at (140,0)(140, 0) and ends at (190,80)(190, 80). If the curve passes through the point (170,60)(170, 60), what percentage of students are taller than 170170 cm?

Solution:

  1. Identify total number of students (nn): n=80n = 80.
  2. Identify students with height 170\le 170 cm: The yy-value at x=170x = 170 is 6060. This means 6060 students are 170170 cm or shorter.
  3. Calculate students taller than 170170 cm: 8060=2080 - 60 = 20.
  4. Calculate as a percentage: 2080×100%=25%\frac{20}{80} \times 100\% = 25\%.

Explanation:

Cumulative frequency graphs always show the number of data points 'less than or equal to' a specific value. To find the number of values 'greater than', subtract the yy-value from the total frequency.