Statistics and Probability - Descriptive Statistics (Measures of Central Tendency and Dispersion)
Review the key concepts, formulae, and examples before starting your quiz.
🔑Concepts
Discrete vs. Continuous Data: Discrete data consists of isolated values (like the number of students), while continuous data can take any value within a range (like height). On a graph, discrete data is often represented by bar charts with gaps, whereas continuous data is shown using histograms where bars are adjacent to represent the numerical continuum.
Measures of Central Tendency: These describe the center of a data set. The mean () is the arithmetic average, the median () is the middle value when data is ordered, and the mode is the most frequent value. Visually, in a perfectly symmetrical 'Bell Curve' or Normal Distribution, the mean, median, and mode all overlap at the highest central peak.
Measures of Dispersion: These describe the spread of the data. The Range is the difference between the maximum and minimum values. The Interquartile Range () measures the spread of the middle 50% of the data. Standard Deviation () measures how much values typically deviate from the mean. A low results in a steep, narrow curve, while a high creates a flat, wide curve.
The Box-and-Whisker Plot: A visual summary of the 'Five-Number Summary': Minimum, lower quartile (), median (), upper quartile (), and maximum. It consists of a central box spanning from to with a line at the median, and 'whiskers' extending to the minimum and maximum values (excluding outliers).
Cumulative Frequency Curves: Also known as an 'Ogive', this is a line graph where the y-axis represents the running total of frequencies. The curve typically follows an 'S-shape'. By drawing a horizontal line from of the total frequency to the curve and then down to the x-axis, one can visually estimate the median.
Outliers and Skewness: An outlier is an extreme value that differs significantly from others, often identified if it is more than away from the quartiles. Visually, data is 'Positively Skewed' (right-skewed) if it has a long tail on the right side, meaning the mean is typically greater than the median.
Grouped Data and Mid-interval Values: When data is given in intervals (e.g., ), we use the mid-interval value () to estimate the mean. This assumes data is uniformly distributed within each class. On a histogram, the area of each bar represents the frequency of that specific class interval.
📐Formulae
💡Examples
Problem 1:
A set of exam scores is: . Calculate the mean and the standard deviation for this data set.
Solution:
- Calculate the mean:
- Find the variance ():
- Sum of squared differences:
- Standard Deviation:
Explanation:
To find the mean, sum all values and divide by the count (). To find the standard deviation, find the average of the squared distances from the mean, then take the square root.
Problem 2:
In a data set, and . Determine if a value of is considered an outlier.
Solution:
- Calculate the Interquartile Range:
- Calculate the upper boundary for outliers:
- Compare the value:
Explanation:
Using the rule, we establish the boundaries. Since the value is less than the upper boundary of , it is not an outlier.