Statistics and Probability - Data Presentation (Histograms, Box Plots, Cumulative Frequency)
Review the key concepts, formulae, and examples before starting your quiz.
🔑Concepts
Histograms: These are used for continuous data where the area of each bar represents the frequency. If class widths are unequal, we must plot Frequency Density on the y-axis instead of frequency. Visually, a histogram with unequal class widths will have bars of different widths, and the height is adjusted so that . The calculation used is .
Cumulative Frequency Curves (Ogives): This graph represents the running total of frequencies. We plot the cumulative frequency on the y-axis against the upper class boundary of each interval on the x-axis. Visually, this results in a smooth S-shaped curve. It is used to estimate the median ( of the data), quartiles ( and ), and percentiles by reading across from the y-axis to the curve and then down to the x-axis.
Box-and-Whisker Plots: A visual summary of the five-number summary: Minimum, Lower Quartile (), Median (), Upper Quartile (), and Maximum. The 'box' spans from to , representing the Interquartile Range (), with a vertical line inside marking the median. 'Whiskers' extend to the minimum and maximum values that are not outliers.
Outliers and the Rule: Outliers are extreme values that fall significantly outside the main body of data. Mathematically, a value is an outlier if it is less than or greater than . Visually, outliers are represented on a box plot as individual points (dots or crosses) beyond the whiskers.
Data Skewness: Skewness refers to the symmetry of the data distribution. In a box plot, if the median line is closer to , the data is 'positively skewed' (tail to the right). If the median is closer to , it is 'negatively skewed' (tail to the left). Visually, on a histogram, positive skew shows a 'hump' on the left and a long tail stretching to the right.
Frequency Polygons: A frequency polygon is a line graph used to represent the shape of a frequency distribution. It is created by joining the midpoints of the tops of the bars in a histogram with straight lines. Visually, it provides a simplified view of the data's peak and spread, allowing for easy comparison between multiple datasets on the same axes.
📐Formulae
💡Examples
Problem 1:
A dataset has a Lower Quartile () of and an Upper Quartile () of . Determine if a value of is considered an outlier.
Solution:
- Calculate the Interquartile Range (): 2. Determine the Upper Boundary for outliers: 3. Compare the value to the boundary: Since , the value is an outlier.
Explanation:
To identify outliers, we first find the spread of the middle of the data () and then see if the specific value lies more than times that spread above the third quartile.
Problem 2:
The following table shows the frequency of test scores. Calculate the frequency density for the class interval if the frequency is .
Solution:
- Identify the class width for the interval : 2. Use the frequency density formula:
Explanation:
In a histogram with unequal class widths, the height of the bar (Frequency Density) is found by dividing the frequency by the interval's width. This ensures the area of the bar accurately represents the frequency.