krit.club logo

Statistics - Scatter Diagrams and Correlation

Grade 12IGCSE

Review the key concepts, formulae, and examples before starting your quiz.

🔑Concepts

Bivariate Data: Data involving two variables, often represented as (x,y)(x, y) coordinates to investigate potential relationships.

Positive Correlation: As the independent variable (xx) increases, the dependent variable (yy) also increases.

Negative Correlation: As the independent variable (xx) increases, the dependent variable (yy) decreases.

Zero/No Correlation: No apparent relationship exists between the variables; points are randomly scattered.

Strength of Correlation: Described as 'Strong' if points lie close to a straight line, or 'Weak' if they are widely spread.

Line of Best Fit: A straight line drawn through the center of the data points, used to model the relationship.

Mean Point: The point (xˉ,yˉ)(\bar{x}, \bar{y}) representing the average of all xx and yy values. The Line of Best Fit must pass through this point.

Interpolation: Estimating a value within the range of the given data set (usually reliable).

Extrapolation: Estimating a value outside the range of the given data set (often unreliable as the trend may not continue).

Correlation vs. Causation: A relationship between two variables does not necessarily mean one causes the other.

📐Formulae

Mean of xx: xˉ=xn\bar{x} = \frac{\sum x}{n}

Mean of yy: yˉ=yn\bar{y} = \frac{\sum y}{n}

Equation of Line of Best Fit: y=mx+cy = mx + c

Gradient (mm): m=y2y1x2x1m = \frac{y_2 - y_1}{x_2 - x_1} (calculated using two points on the line of best fit)

💡Examples

Problem 1:

A student records the number of hours spent studying (xx) and the test scores (yy) for 5 students: (2, 40), (4, 55), (6, 65), (8, 75), (10, 90). (a) Identify the type of correlation. (b) Calculate the mean point (xˉ,yˉ)(\bar{x}, \bar{y}).

Solution:

(a) Positive Correlation. (b) xˉ=2+4+6+8+105=6\bar{x} = \frac{2+4+6+8+10}{5} = 6. yˉ=40+55+65+75+905=65\bar{y} = \frac{40+55+65+75+90}{5} = 65. Mean point = (6,65)(6, 65).

Explanation:

Since test scores increase as study hours increase, the correlation is positive. The mean point is found by averaging all xx values and all yy values respectively.

Problem 2:

A scatter diagram shows a strong negative correlation between the age of a car (xx years) and its value (yy dollars). The line of best fit passes through (2, 20000) and (8, 8000). Predict the value of a car that is 5 years old.

Solution:

  1. Find gradient: m=80002000082=120006=2000m = \frac{8000 - 20000}{8 - 2} = \frac{-12000}{6} = -2000. 2. Use y=mx+cy = mx + c with point (2, 20000): 20000=2000(2)+cc=2400020000 = -2000(2) + c \Rightarrow c = 24000. 3. Equation: y=2000x+24000y = -2000x + 24000. 4. For x=5x = 5: y=2000(5)+24000=14000y = -2000(5) + 24000 = 14000. Predicted value: $14,000.

Explanation:

We first determine the linear equation that represents the line of best fit and then substitute the target xx value (age) to find the predicted yy value (price). Since 5 years is within the range of 2 to 8, this is an example of interpolation.