krit.club logo

Statistics and Probability - Descriptive Statistics (Measures of Central Tendency and Dispersion)

Grade 12IB

Review the key concepts, formulae, and examples before starting your quiz.

🔑Concepts

Types of Data: Statistical data is classified as either Discrete (countable values like the number of students) or Continuous (measurable values like height or time). Visually, discrete data is often represented by bar charts with gaps between bars, whereas continuous data is shown using histograms where bars are adjacent to represent a continuous scale of intervals.

Measures of Central Tendency: These include the Mean (arithmetic average xˉ\bar{x}), Median (the middle value of an ordered set), and Mode (the most frequent value). In a perfectly symmetrical normal distribution curve, these three measures coincide at the center peak. In skewed distributions, the mean is pulled toward the tail: if the tail stretches to the right (positive skew), the mean is typically greater than the median.

Measures of Dispersion - Range and IQR: Dispersion describes how spread out the data is. The Range is the difference between the maximum and minimum values. The Interquartile Range (IQRIQR) represents the middle 50%50\% of the data. Visually, this is seen in a Box-and-Whisker plot where the central box spans from the first quartile (Q1Q_1) to the third quartile (Q3Q_3), and the length of this box represents the IQRIQR.

Standard Deviation and Variance: The Standard Deviation (σ\sigma) measures the average distance of each data point from the mean. A small σ\sigma indicates data points are clustered closely around the mean, resulting in a narrow, tall bell curve. A large σ\sigma indicates high variability, resulting in a wider, flatter curve. Variance is simply the square of the standard deviation (σ2\sigma^2).

Cumulative Frequency and Ogives: Cumulative frequency is the running total of frequencies. When plotted against the upper class boundaries, it forms an 'S-shaped' curve known as an ogive. This visual tool is used to estimate the median (at the 50%50\% mark on the y-axis), quartiles (at 25%25\% and 75%75\%), and percentiles by drawing a horizontal line to the curve and then a vertical line down to the x-axis.

Outliers and Box Plots: An outlier is an extreme value that lies significantly outside the general pattern of the data. Mathematically, an outlier is often defined as any value less than Q11.5×IQRQ_1 - 1.5 \times IQR or greater than Q3+1.5×IQRQ_3 + 1.5 \times IQR. On a box plot, these are visually identified as individual points or crosses (xx) beyond the ends of the whiskers.

Grouped Data Analysis: For data presented in intervals, the mean is estimated using the mid-interval values (xix_i). Visually, the frequency density in a histogram (where FrequencyDensity=FrequencyClassWidthFrequency Density = \frac{Frequency}{Class Width}) ensures that the area of each bar is proportional to the frequency, which is particularly important when class widths are unequal.

📐Formulae

Mean for a data set: xˉ=i=1nxin\bar{x} = \frac{\sum_{i=1}^{n} x_i}{n}

Mean for frequency distribution: xˉ=fixifi\bar{x} = \frac{\sum f_i x_i}{\sum f_i}

Interquartile Range: IQR=Q3Q1IQR = Q_3 - Q_1

Population Variance: σ2=i=1n(xixˉ)2n\sigma^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n}

Population Standard Deviation: σ=fixi2nxˉ2\sigma = \sqrt{\frac{\sum f_i x_i^2}{n} - \bar{x}^2}

Lower Boundary for Outliers: L=Q11.5(IQR)L = Q_1 - 1.5(IQR)

Upper Boundary for Outliers: U=Q3+1.5(IQR)U = Q_3 + 1.5(IQR)

💡Examples

Problem 1:

A set of math quiz scores for 10 students is: 12,15,15,17,18,20,22,25,25,4512, 15, 15, 17, 18, 20, 22, 25, 25, 45. Determine the mean, the IQRIQR, and identify if there are any outliers.

Solution:

  1. Find the Mean: xˉ=12+15+15+17+18+20+22+25+25+4510=21410=21.4\bar{x} = \frac{12+15+15+17+18+20+22+25+25+45}{10} = \frac{214}{10} = 21.4 2. Find the Median and Quartiles: The data is already ordered. Median is the average of the 5th and 6th values: 18+202=19\frac{18+20}{2} = 19. Q1Q_1 (median of the lower half) is the 3rd value: 1515. Q3Q_3 (median of the upper half) is the 8th value: 2525. 3. Calculate IQRIQR: IQR=Q3Q1=2515=10IQR = Q_3 - Q_1 = 25 - 15 = 10 4. Check for Outliers: Lower boundary =151.5(10)=0= 15 - 1.5(10) = 0. Upper boundary =25+1.5(10)=40= 25 + 1.5(10) = 40. Since 45>4045 > 40, the score 4545 is an outlier.

Explanation:

To identify outliers, we first calculate the central tendency and the spread via IQRIQR. The 1.5×IQR1.5 \times IQR rule provides a standardized threshold to check if the extreme value (45) is statistically distant from the rest of the group.

Problem 2:

Calculate the estimated mean and standard deviation for the following grouped frequency table:

  • Interval 0x<100 \le x < 10: Frequency 22
  • Interval 10x<2010 \le x < 20: Frequency 77
  • Interval 20x<3020 \le x < 30: Frequency 11

Solution:

  1. Identify mid-intervals (xix_i): 5,15,255, 15, 25. 2. Calculate fixi\sum f_i x_i: (2×5)+(7×15)+(1×25)=10+105+25=140(2 \times 5) + (7 \times 15) + (1 \times 25) = 10 + 105 + 25 = 140 3. Find total frequency nn: 2+7+1=102 + 7 + 1 = 10. 4. Estimated Mean: xˉ=14010=14\bar{x} = \frac{140}{10} = 14 5. Calculate fixi2\sum f_i x_i^2: (2×52)+(7×152)+(1×252)=50+1575+625=2250(2 \times 5^2) + (7 \times 15^2) + (1 \times 25^2) = 50 + 1575 + 625 = 2250 6. Standard Deviation: σ=225010142=225196=295.39\sigma = \sqrt{\frac{2250}{10} - 14^2} = \sqrt{225 - 196} = \sqrt{29} \approx 5.39

Explanation:

For grouped data, we assume all values in an interval are represented by the midpoint. The mean is the weighted average of these midpoints, and the standard deviation measures the spread around this estimated mean.