krit.club logo

Statistics and Probability - Analysis of Frequency Distributions

Grade 11ICSE

Review the key concepts, formulae, and examples before starting your quiz.

🔑Concepts

Measures of Dispersion: Dispersion refers to the extent to which data values are spread out or scattered around a central value like the mean or median. Visually, if you plot two frequency distributions on the same graph, a distribution with high dispersion will appear wide and flat (like a shallow hill), while one with low dispersion will appear narrow and tall (like a steep peak).

Range and Interquartile Range: The range is the simplest measure of dispersion, calculated as the difference between the maximum and minimum values. Visually, it represents the total horizontal length covered by the data on a number line. The Interquartile Range (IQR) focuses on the middle 50%50\% of the data, represented visually by the width of the 'box' in a box-and-whisker plot.

Mean Deviation: This is the arithmetic mean of the absolute deviations of the observations from a central value (mean or median). In a scatter plot, if you draw a horizontal line at the mean, the mean deviation is the average of the vertical distances from each point to that line, ignoring whether the point is above or below it.

Variance: Variance is the average of the squares of the deviations from the arithmetic mean. By squaring the deviations, we ensure all values are positive and give more weight to outliers. Visually, a larger variance indicates that data points are likely to be found further away from the center of the distribution curve.

Standard Deviation (SD): The square root of the variance is the standard deviation, which is the most widely used measure of dispersion as it shares the same units as the data. On a normal bell-shaped curve, approximately 68%68\% of the data falls within one standard deviation of the mean, representing the 'typical' spread of the data.

Coefficient of Variation (CV): This is a relative measure of dispersion, expressed as a percentage, used to compare the variability of two or more series even if they have different units or means. When comparing two frequency polygons, the one with the higher CVCV is considered more 'variable' or 'unstable,' while the one with the lower CVCV is more 'consistent' or 'homogeneous.'

Comparison of Distributions: For two frequency distributions with the same mean, the distribution with the smaller standard deviation (and thus smaller CVCV) is more consistent. Visually, this distribution will have a higher concentration of frequencies near the mean, resulting in a more 'peaked' frequency curve compared to the other.

📐Formulae

Mean (Grouped Data): xˉ=fixiN\bar{x} = \frac{\sum f_i x_i}{N}, where N=fiN = \sum f_i

Mean Deviation about Mean: MD(xˉ)=fixixˉNMD(\bar{x}) = \frac{\sum f_i |x_i - \bar{x}|}{N}

Mean Deviation about Median: MD(M)=fixiMNMD(M) = \frac{\sum f_i |x_i - M|}{N}

Variance (Discrete): σ2=fi(xixˉ)2N\sigma^2 = \frac{\sum f_i(x_i - \bar{x})^2}{N}

Standard Deviation (Shortcut Method): σ=fidi2N(fidiN)2\sigma = \sqrt{\frac{\sum f_i d_i^2}{N} - (\frac{\sum f_i d_i}{N})^2}, where di=xiAd_i = x_i - A

Standard Deviation (Step Deviation Method): σ=h×fiui2N(fiuiN)2\sigma = h \times \sqrt{\frac{\sum f_i u_i^2}{N} - (\frac{\sum f_i u_i}{N})^2}, where ui=xiAhu_i = \frac{x_i - A}{h}

Coefficient of Variation: CV=σxˉ×100CV = \frac{\sigma}{\bar{x}} \times 100

💡Examples

Problem 1:

Calculate the mean deviation about the mean for the following data: 6,7,10,12,13,4,8,126, 7, 10, 12, 13, 4, 8, 12.

Solution:

  1. Find the mean (xˉ\bar{x}): xˉ=6+7+10+12+13+4+8+128=728=9\bar{x} = \frac{6+7+10+12+13+4+8+12}{8} = \frac{72}{8} = 9
  2. Find absolute deviations xixˉ|x_i - \bar{x}|: 69=3,79=2,109=1,129=3,139=4,49=5,89=1,129=3|6-9|=3, |7-9|=2, |10-9|=1, |12-9|=3, |13-9|=4, |4-9|=5, |8-9|=1, |12-9|=3
  3. Sum of absolute deviations: xixˉ=3+2+1+3+4+5+1+3=22\sum |x_i - \bar{x}| = 3+2+1+3+4+5+1+3 = 22
  4. Mean Deviation: MD(xˉ)=xixˉn=228=2.75MD(\bar{x}) = \frac{\sum |x_i - \bar{x}|}{n} = \frac{22}{8} = 2.75

Explanation:

To find the mean deviation, we first determine the central point (the mean), then measure the average absolute distance of all data points from that center.

Problem 2:

Two series A and B have the following characteristics: Series A: Mean = 5050, Standard Deviation = 55 Series B: Mean = 6060, Standard Deviation = 99 Which series is more consistent?

Solution:

  1. Calculate Coefficient of Variation for Series A: CVA=σAxˉA×100=550×100=10%CV_A = \frac{\sigma_A}{\bar{x}_A} \times 100 = \frac{5}{50} \times 100 = 10\%
  2. Calculate Coefficient of Variation for Series B: CVB=σBxˉB×100=960×100=15%CV_B = \frac{\sigma_B}{\bar{x}_B} \times 100 = \frac{9}{60} \times 100 = 15\%
  3. Compare CVCV: Since CVA<CVBCV_A < CV_B (10%<15%10\% < 15\%), Series A is more consistent.

Explanation:

The Coefficient of Variation (CV) is used to compare consistency. A lower CV indicates less relative variability and higher consistency.