Review the key concepts, formulae, and examples before starting your quiz.
🔑Concepts
A Scatter Diagram is a visual representation of bivariate data where each pair of observations is plotted as a point on a Cartesian plane. If the points cluster around a straight line rising from left to right, it indicates a positive linear correlation; if they fall from left to right, it indicates a negative linear correlation.
The Line of Best Fit (Regression Line) is a mathematical line that best represents the trend of the data points. Visually, it is drawn such that the vertical distances (residuals) between the actual data points and the line are minimized. In the 'Least Squares Method', we minimize the sum of the squares of these residuals.
There are two regression lines for every bivariate distribution: the line of on (used to estimate for a given ) and the line of on (used to estimate for a given ). Geometrically, both lines always intersect at the point , which represents the arithmetic means of the two variables.
The Regression Coefficients, denoted as and , represent the slopes of the regression lines. measures the change in per unit change in . Visually, if is positive, the line of on slopes upward; if negative, it slopes downward.
The Correlation Coefficient () is the geometric mean of the two regression coefficients: . The sign of is always the same as the sign of and . On a graph, if or , all points lie exactly on a single straight line.
The angle between the two regression lines indicates the strength of the correlation. If the lines are perpendicular, the correlation is zero (), appearing as a circular cloud of points. If the lines coincide (the angle is ), the correlation is perfect ().
📐Formulae
Mean: and
Regression Equation of on :
Regression Equation of on :
Regression Coefficient or
Regression Coefficient or
Coefficient of Correlation: (Note: takes the sign of the coefficients)
Standard Deviation:
💡Examples
Problem 1:
Given the following data: , , , , and . Find the regression equation of on .
Solution:
- Calculate the means: and . \ 2. Calculate the regression coefficient : \ 3. Form the equation: \ \
Explanation:
We first identify the necessary sums and calculate the means. Then, we use the formula for which uses the sums directly. Finally, we substitute the mean values and the coefficient into the point-slope form of the regression line equation.
Problem 2:
The two regression lines are and . Find the mean values of and and the correlation coefficient .
Solution:
- To find the means , solve the equations simultaneously: \ (Eq 1) \ (Eq 2) \ Substitute Eq 2 into Eq 1: . \ Substitute into Eq 2: . \ 2. To find , assume is on : , so . \ Assume is on : , so . \ 3. Check consistency: . Since , the assumption is correct. \ 4. (sign is negative because both coefficients are negative).
Explanation:
Since the regression lines intersect at the means, we solve the system of equations to find and . To find , we identify and by rearranging the equations, ensuring the product of the slopes is less than or equal to 1. The sign of matches the slopes.