Statistics and Probability - Representation of data: frequency tables, histograms, and cumulative frequency graphs
Review the key concepts, formulae, and examples before starting your quiz.
🔑Concepts
Frequency Tables for Grouped Data: These tables organize continuous data into intervals or 'classes' (e.g., ). Visually, the table consists of a column for the class intervals and a column for the frequency (), which represents how many data points fall within each range.
Histograms: Unlike bar charts, histograms are used for continuous data and have no gaps between bars. The horizontal -axis represents the continuous scale (e.g., time, mass), and the area of each bar is proportional to the frequency. For bars of equal width, the height of the bar represents the frequency.
Frequency Density: When class widths are unequal in a histogram, the height of the bar is the frequency density rather than the frequency. Visually, this ensures that the area () correctly represents the frequency, preventing wider classes from looking artificially 'larger' than narrow ones.
Cumulative Frequency: This is a 'running total' of frequencies. To calculate it, you add each class frequency to the sum of all previous frequencies. In a table, this results in a non-decreasing sequence of values ending at the total number of data points ().
Cumulative Frequency Graphs (Ogives): This graph is created by plotting the cumulative frequency on the -axis against the upper class boundary of each interval on the -axis. The points are connected with a smooth S-shaped curve or straight lines, starting from the lower boundary of the first class at zero frequency.
Median and Quartiles from Graphs: To find the median () visually, locate the position on the -axis, move horizontally to the curve, and then vertically down to the -axis. Similarly, the Lower Quartile () is at and the Upper Quartile () is at .
Interquartile Range (IQR) and Box Plots: The IQR is the difference between and , representing the spread of the middle of the data. This data can be visually summarized in a Box Plot, where a central box spans from to with a line at the median, and 'whiskers' extend to the minimum and maximum values.
Estimated Mean from Grouped Data: Because exact values are unknown in grouped tables, the mean is estimated using the midpoint () of each class. Visually, we assume all data points in a bar are concentrated at the center of that bar's interval.
📐Formulae
💡Examples
Problem 1:
Calculate the estimated mean for the following frequency table of test scores:
- : Frequency
- : Frequency
- : Frequency
Solution:
-
Find midpoints () for each class:
- Class 1:
- Class 2:
- Class 3:
-
Calculate for each class:
-
Sum the frequencies ():
-
Sum the values ():
-
Calculate Mean:
Explanation:
To estimate the mean from grouped data, we assume every value in an interval is equal to the midpoint of that interval. We then multiply these midpoints by their respective frequencies and divide by the total number of observations.
Problem 2:
A cumulative frequency graph for students' heights starts at and ends at . If the curve passes through the point , what percentage of students are taller than cm?
Solution:
- Identify total number of students (): .
- Identify students with height cm: The -value at is . This means students are cm or shorter.
- Calculate students taller than cm: .
- Calculate as a percentage: .
Explanation:
Cumulative frequency graphs always show the number of data points 'less than or equal to' a specific value. To find the number of values 'greater than', subtract the -value from the total frequency.