Review the key concepts, formulae, and examples before starting your quiz.
🔑Concepts
Definition of Regression Lines: Regression lines are the 'best-fit' straight lines that represent the mathematical relationship between two variables, and . In a scatter plot, these lines are positioned to minimize the distance between the data points and the line itself.
Line of Regression of on : This line is used to estimate or predict the value of the dependent variable for a given value of the independent variable . Visually, this line minimizes the sum of the squares of the vertical deviations (distances parallel to the -axis) between the observed points and the line.
Line of Regression of on : This line is used to estimate or predict the value of the independent variable for a given value of the dependent variable . Visually, this line minimizes the sum of the squares of the horizontal deviations (distances parallel to the -axis) between the observed points and the line.
The Centroid (Point of Intersection): Both regression lines always pass through the point , where is the mean of the -values and is the mean of the -values. On a graph, this point acts as the pivot or balance point for both lines.
Regression Coefficients: The slopes of the lines, denoted as ( on ) and ( on ), indicate the change in one variable for a unit change in the other. A key property is that both coefficients must have the same sign (either both positive or both negative), which is also the sign of the correlation coefficient .
Correlation and the Angle between Lines: The geometric angle between the two regression lines indicates the strength of the correlation. If , the lines coincide (the angle is ), representing perfect correlation. If , the lines are perpendicular (intersecting at ), indicating no linear correlation.
The Geometric Mean Property: The correlation coefficient is the geometric mean of the two regression coefficients. This is expressed as . Because , it follows that the product of the two slopes can never exceed 1.
Estimation Validity: When predicting , always use the on line; when predicting , always use the on line. Using the wrong line for prediction results in higher estimation error.
📐Formulae
Line of regression of on :
Line of regression of on :
Regression coefficient
Regression coefficient
Correlation coefficient:
Standard calculation for
Standard calculation for
Covariance:
💡Examples
Problem 1:
Given the following data: Mean of , Mean of , Standard deviation of , Standard deviation of , and Correlation coefficient . Find the two regression lines and estimate when .
Solution:
- Find : .
- Find : .
- Equation of on : .
- Equation of on : .
- Estimate for : Using the on line, .
Explanation:
We first calculate the regression coefficients using the standard deviations and correlation. Then we use the point-slope form with the means to derive the linear equations. Finally, we use the on line for prediction since is given.
Problem 2:
The two lines of regression are and . Find the mean values of and , and the correlation coefficient .
Solution:
- Find Means: Solve the equations simultaneously. (i) (ii) Multiply (i) by 2: . Subtract (ii) from this: . Substitute in (i): . So, .
- Find : Assume is the line on . . Then must be on . .
- Check validity: . Since , our assumption is correct.
- Calculate : (negative because both values are negative).
Explanation:
The means are found at the intersection of the two lines. To find , we assume which line is which, calculate the slopes, and verify that their product is . The sign of matches the sign of the slopes.