krit.club logo

Statistics - Scatter Diagrams and Correlation

Grade 9IGCSE

Review the key concepts, formulae, and examples before starting your quiz.

🔑Concepts

Scatter Diagram: A graph used to display the relationship (correlation) between two sets of numerical data (bivariate data).

Positive Correlation: As one variable increases, the other variable also increases. The points generally follow an upward trend from left to right.

Negative Correlation: As one variable increases, the other variable decreases. The points generally follow a downward trend from left to right.

Zero (No) Correlation: There is no apparent relationship between the two variables; points are scattered randomly.

Strength of Correlation: Described as 'Strong' if the points lie very close to a straight line, or 'Weak' if they are more spread out but still show a trend.

Line of Best Fit: A straight line drawn through the data points that best represents the trend. It should pass through the mean point and have an equal distribution of points above and below it.

Interpolation: Estimating a value within the range of the given data points using the line of best fit (usually reliable).

Extrapolation: Estimating a value outside the range of the given data points (less reliable as the trend may not continue).

📐Formulae

Mean of x values: xˉ=xn\text{Mean of } x \text{ values: } \bar{x} = \frac{\sum x}{n}

Mean of y values: yˉ=yn\text{Mean of } y \text{ values: } \bar{y} = \frac{\sum y}{n}

The Line of Best Fit must pass through the mean point: (xˉ,yˉ)\text{The Line of Best Fit must pass through the mean point: } (\bar{x}, \bar{y})

💡Examples

Problem 1:

A researcher collects data on the age of a car (years) and its current market value ($). What type of correlation would you expect to see on a scatter diagram?

Solution:

Strong Negative Correlation

Explanation:

As the age of a car increases, its market value typically decreases. Because this relationship is usually very consistent, it is considered a strong negative correlation.

Problem 2:

The mean of the x-coordinates (hours studied) is 5, and the mean of the y-coordinates (test scores) is 65. If a student draws a line of best fit, which specific coordinate must the line pass through?

Solution:

(5,65)(5, 65)

Explanation:

In statistics, the line of best fit for a scatter diagram is mathematically required to pass through the mean point, represented by (xˉ,yˉ)(\bar{x}, \bar{y}).

Problem 3:

Using a line of best fit y=2x+10y = 2x + 10, where xx is the number of sunny hours and yy is the number of visitors to a park, estimate the number of visitors if there are 12 sunny hours.

Solution:

34 visitors

Explanation:

Substitute x=12x = 12 into the linear equation: y=2(12)+10y=24+10=34y = 2(12) + 10 \Rightarrow y = 24 + 10 = 34. This process is called interpolation if 12 hours is within the range of original data.