krit.club logo

Statistics - Scatter Diagrams and Correlation

Grade 10IGCSE

Review the key concepts, formulae, and examples before starting your quiz.

🔑Concepts

Scatter Diagrams: A graphical representation of the relationship between two numerical variables, plotted as coordinates (x, y).

Independent Variable: Usually plotted on the x-axis (the variable being controlled or changed).

Dependent Variable: Usually plotted on the y-axis (the variable being measured).

Positive Correlation: As the x-variable increases, the y-variable also increases (points trend upwards from left to right).

Negative Correlation: As the x-variable increases, the y-variable decreases (points trend downwards from left to right).

No Correlation: No visible pattern or relationship between the two variables.

Strength of Correlation: Described as 'Strong' if points are close to a straight line, or 'Weak' if points are widely spread.

Line of Best Fit: A straight line drawn through the center of the points that represents the general trend. It should pass through the mean point (xˉ,yˉ)(\bar{x}, \bar{y}).

Interpolation: Estimating a value within the range of the data (usually reliable).

Extrapolation: Estimating a value outside the range of the data (unreliable as the trend may change).

📐Formulae

Mean of x values: xˉ=xn\text{Mean of } x \text{ values: } \bar{x} = \frac{\sum x}{n}

Mean of y values: yˉ=yn\text{Mean of } y \text{ values: } \bar{y} = \frac{\sum y}{n}

The Line of Best Fit must pass through the Mean Point: (xˉ,yˉ)\text{The Line of Best Fit must pass through the Mean Point: } (\bar{x}, \bar{y})

💡Examples

Problem 1:

A student records the number of hours spent playing video games (xx) and the score in a math test (yy) for 5 friends: (2, 85), (5, 60), (1, 90), (8, 40), (4, 75). Identify the type of correlation.

Solution:

Negative Correlation.

Explanation:

As the number of hours spent playing video games increases, the test scores decrease. This inverse relationship represents a negative correlation.

Problem 2:

Calculate the mean point (xˉ,yˉ)(\bar{x}, \bar{y}) for the following data set: x=[2,4,6,8]x = [2, 4, 6, 8], y=[10,20,30,40]y = [10, 20, 30, 40].

Solution:

Mean point = (5, 25)

Explanation:

xˉ=2+4+6+84=204=5\bar{x} = \frac{2+4+6+8}{4} = \frac{20}{4} = 5. yˉ=10+20+30+404=1004=25\bar{y} = \frac{10+20+30+40}{4} = \frac{100}{4} = 25. The line of best fit for this data must pass through the point (5, 25).

Problem 3:

A scatter diagram shows a strong positive correlation between temperature (xx) and ice cream sales (yy). The line of best fit is y=5x+10y = 5x + 10. Predict the sales if the temperature is 30C30^\circ C. Is this interpolation or extrapolation if the data range was 10C10^\circ C to 25C25^\circ C?

Solution:

Sales = 160; This is Extrapolation.

Explanation:

Substitute x=30x = 30 into the equation: y=5(30)+10=160y = 5(30) + 10 = 160. Since 30C30^\circ C is outside the original data range of 1025C10-25^\circ C, it is considered extrapolation and may not be accurate.