krit.club logo

Statistics and Probability - Descriptive Statistics (Measures of Central Tendency and Dispersion)

Grade 11IB

Review the key concepts, formulae, and examples before starting your quiz.

🔑Concepts

Discrete vs. Continuous Data: Discrete data consists of isolated values (like the number of students), while continuous data can take any value within a range (like height). On a graph, discrete data is often represented by bar charts with gaps, whereas continuous data is shown using histograms where bars are adjacent to represent the numerical continuum.

Measures of Central Tendency: These describe the center of a data set. The mean (xˉ\bar{x}) is the arithmetic average, the median (MM) is the middle value when data is ordered, and the mode is the most frequent value. Visually, in a perfectly symmetrical 'Bell Curve' or Normal Distribution, the mean, median, and mode all overlap at the highest central peak.

Measures of Dispersion: These describe the spread of the data. The Range is the difference between the maximum and minimum values. The Interquartile Range (IQRIQR) measures the spread of the middle 50% of the data. Standard Deviation (σ\sigma) measures how much values typically deviate from the mean. A low σ\sigma results in a steep, narrow curve, while a high σ\sigma creates a flat, wide curve.

The Box-and-Whisker Plot: A visual summary of the 'Five-Number Summary': Minimum, lower quartile (Q1Q_{1}), median (Q2Q_{2}), upper quartile (Q3Q_{3}), and maximum. It consists of a central box spanning from Q1Q_{1} to Q3Q_{3} with a line at the median, and 'whiskers' extending to the minimum and maximum values (excluding outliers).

Cumulative Frequency Curves: Also known as an 'Ogive', this is a line graph where the y-axis represents the running total of frequencies. The curve typically follows an 'S-shape'. By drawing a horizontal line from 50%50\% of the total frequency to the curve and then down to the x-axis, one can visually estimate the median.

Outliers and Skewness: An outlier is an extreme value that differs significantly from others, often identified if it is more than 1.5×IQR1.5 \times IQR away from the quartiles. Visually, data is 'Positively Skewed' (right-skewed) if it has a long tail on the right side, meaning the mean is typically greater than the median.

Grouped Data and Mid-interval Values: When data is given in intervals (e.g., 10x<2010 \le x < 20), we use the mid-interval value (xmx_{m}) to estimate the mean. This assumes data is uniformly distributed within each class. On a histogram, the area of each bar represents the frequency of that specific class interval.

📐Formulae

xˉ=i=1nxin\bar{x} = \frac{\sum_{i=1}^{n} x_{i}}{n}

xˉ=fixifi\bar{x} = \frac{\sum f_{i} x_{i}}{\sum f_{i}}

IQR=Q3Q1IQR = Q_{3} - Q_{1}

σ=fi(xixˉ)2n\sigma = \sqrt{\frac{\sum f_{i} (x_{i} - \bar{x})^{2}}{n}}

Variance=σ2Variance = \sigma^{2}

Lower Outlier Boundary=Q11.5×IQRLower \ Outlier \ Boundary = Q_{1} - 1.5 \times IQR

Upper Outlier Boundary=Q3+1.5×IQRUpper \ Outlier \ Boundary = Q_{3} + 1.5 \times IQR

💡Examples

Problem 1:

A set of exam scores is: 56,62,62,67,71,75,88,9256, 62, 62, 67, 71, 75, 88, 92. Calculate the mean and the standard deviation for this data set.

Solution:

  1. Calculate the mean: xˉ=56+62+62+67+71+75+88+928=5738=71.625\bar{x} = \frac{56 + 62 + 62 + 67 + 71 + 75 + 88 + 92}{8} = \frac{573}{8} = 71.625
  2. Find the variance (σ2\sigma^{2}): σ2=(xxˉ)2n\sigma^{2} = \frac{\sum (x - \bar{x})^{2}}{n}
  3. Sum of squared differences: (5671.625)2+(6271.625)2+...+(9271.625)21113.875(56-71.625)^{2} + (62-71.625)^{2} + ... + (92-71.625)^{2} \approx 1113.875
  4. σ2=1113.8758=139.234\sigma^{2} = \frac{1113.875}{8} = 139.234
  5. Standard Deviation: σ=139.23411.8\sigma = \sqrt{139.234} \approx 11.8

Explanation:

To find the mean, sum all values and divide by the count (n=8n=8). To find the standard deviation, find the average of the squared distances from the mean, then take the square root.

Problem 2:

In a data set, Q1=45Q_{1} = 45 and Q3=70Q_{3} = 70. Determine if a value of 9090 is considered an outlier.

Solution:

  1. Calculate the Interquartile Range: IQR=Q3Q1=7045=25IQR = Q_{3} - Q_{1} = 70 - 45 = 25
  2. Calculate the upper boundary for outliers: Upper=Q3+1.5(IQR)Upper = Q_{3} + 1.5(IQR)
  3. Upper=70+1.5(25)=70+37.5=107.5Upper = 70 + 1.5(25) = 70 + 37.5 = 107.5
  4. Compare the value: 90<107.590 < 107.5

Explanation:

Using the 1.5×IQR1.5 \times IQR rule, we establish the boundaries. Since the value 9090 is less than the upper boundary of 107.5107.5, it is not an outlier.