krit.club logo

Linear Regression - Regression Coefficients and their Properties

Grade 12ICSE

Review the key concepts, formulae, and examples before starting your quiz.

🔑Concepts

Regression Lines and the Point of Intersection: There are two regression lines, the line of YY on XX and the line of XX on YY. These lines represent the best linear fit for predicting one variable given the other. Visually, these two lines always intersect at the point (xˉ,yˉ)(\bar{x}, \bar{y}), which represents the arithmetic means of the two variables.

Nature of Regression Coefficients: The regression coefficients, byxb_{yx} and bxyb_{xy}, are the slopes of the regression lines of YY on XX and XX on YY respectively. byxb_{yx} measures the change in YY for a unit change in XX, while bxyb_{xy} measures the change in XX for a unit change in YY. On a coordinate plane, the steeper the regression line of YY on XX, the larger the absolute value of byxb_{yx}.

Sign Consistency Property: The correlation coefficient (rr) and both regression coefficients (byxb_{yx} and bxyb_{xy}) must always have the same sign. If rr is positive, both regression lines slope upwards from left to right; if rr is negative, both lines slope downwards. It is impossible for one coefficient to be positive and the other negative.

Geometric Mean Property: The correlation coefficient rr is the geometric mean of the two regression coefficients. This is mathematically expressed as r2=byxbxyr^2 = b_{yx} \cdot b_{xy}. When calculating r=±byxbxyr = \pm\sqrt{b_{yx} \cdot b_{xy}}, the sign of rr is determined by the sign of the regression coefficients.

The Magnitude Property: The product of the two regression coefficients cannot exceed 1 (byxbxy1b_{yx} \cdot b_{xy} \le 1). This implies that if one regression coefficient is greater than 1 in absolute value, the other must be less than 1. This ensures that the correlation coefficient r|r| never exceeds 1.

Angle Between the Lines: The angle θ\theta between the two regression lines indicates the strength of the correlation between the variables. If r=±1r = \pm 1, the angle is 00^\circ and the lines coincide. If r=0r = 0, the angle is 9090^\circ and the lines are perpendicular. As rr moves from 0 toward 1, the visual 'gap' between the two lines closes as they rotate toward each other.

Property of Origin and Scale: Regression coefficients are independent of the change of origin but are dependent on the change of scale. If we transform XX to u=Xahu = \frac{X-a}{h} and YY to v=Yckv = \frac{Y-c}{k}, the new coefficient is related to the old one by the ratio of the scales: byx=khbvub_{yx} = \frac{k}{h} b_{vu}.

📐Formulae

Regression Line of YY on XX: Yyˉ=byx(Xxˉ)Y - \bar{y} = b_{yx}(X - \bar{x})

Regression Line of XX on YY: Xxˉ=bxy(Yyˉ)X - \bar{x} = b_{xy}(Y - \bar{y})

Regression Coefficient byx=rσyσxb_{yx} = r \frac{\sigma_y}{\sigma_x}

Regression Coefficient bxy=rσxσyb_{xy} = r \frac{\sigma_x}{\sigma_y}

Correlation Coefficient: r=±byxbxyr = \pm\sqrt{b_{yx} \cdot b_{xy}}

Direct Calculation of byxb_{yx}: byx=nXY(X)(Y)nX2(X)2b_{yx} = \frac{n\sum XY - (\sum X)(\sum Y)}{n\sum X^2 - (\sum X)^2}

Direct Calculation of bxyb_{xy}: bxy=nXY(X)(Y)nY2(Y)2b_{xy} = \frac{n\sum XY - (\sum X)(\sum Y)}{n\sum Y^2 - (\sum Y)^2}

Covariance Relationship: byx=Cov(X,Y)σx2b_{yx} = \frac{Cov(X, Y)}{\sigma_x^2} and bxy=Cov(X,Y)σy2b_{xy} = \frac{Cov(X, Y)}{\sigma_y^2}

💡Examples

Problem 1:

Given the two regression lines 2x+3y6=02x + 3y - 6 = 0 and 4x+y4=04x + y - 4 = 0, find the mean values of xx and yy, and the correlation coefficient rr.

Solution:

  1. To find the means (xˉ,yˉ)(\bar{x}, \bar{y}), we solve the equations simultaneously: 2x+3y=6— (i)2x + 3y = 6 \quad \text{--- (i)} 4x+y=4— (ii)4x + y = 4 \quad \text{--- (ii)} Multiply (ii) by 3: 12x+3y=1212x + 3y = 12. Subtract (i): 10x=6xˉ=0.610x = 6 \Rightarrow \bar{x} = 0.6. Substitute xˉ\bar{x} into (ii): 4(0.6)+y=42.4+y=4yˉ=1.64(0.6) + y = 4 \Rightarrow 2.4 + y = 4 \Rightarrow \bar{y} = 1.6.

  2. To find rr, we first identify the coefficients. Assume (i) is YY on XX: 3y=2x+6y=23x+23y = -2x + 6 \Rightarrow y = -\frac{2}{3}x + 2. Thus byx=230.67b_{yx} = -\frac{2}{3} \approx -0.67. Assume (ii) is XX on YY: 4x=y+4x=14y+14x = -y + 4 \Rightarrow x = -\frac{1}{4}y + 1. Thus bxy=14=0.25b_{xy} = -\frac{1}{4} = -0.25. Check validity: byxbxy=(0.67)(0.25)=0.1675b_{yx} \cdot b_{xy} = (-0.67)(-0.25) = 0.1675. Since 0.1675<10.1675 < 1, the assumption is correct. r=0.16750.41r = -\sqrt{0.1675} \approx -0.41 (negative sign because both coefficients are negative).

Explanation:

We use the property that regression lines intersect at the means to find xˉ\bar{x} and yˉ\bar{y}. To find rr, we must correctly assign which equation is YY on XX such that the product of the slopes is 1\le 1.

Problem 2:

If the regression coefficient of YY on XX is 0.80.8, the regression coefficient of XX on YY is 0.450.45, and the variance of XX is 99, find the variance of YY.

Solution:

  1. We are given byx=0.8b_{yx} = 0.8, bxy=0.45b_{xy} = 0.45, and σx2=9\sigma_x^2 = 9 (so σx=3\sigma_x = 3).
  2. We know the relationship: byx=rσyσxb_{yx} = r \frac{\sigma_y}{\sigma_x} and bxy=rσxσyb_{xy} = r \frac{\sigma_x}{\sigma_y}.
  3. Dividing the two gives: byxbxy=r(σy/σx)r(σx/σy)=σy2σx2\frac{b_{yx}}{b_{xy}} = \frac{r (\sigma_y / \sigma_x)}{r (\sigma_x / \sigma_y)} = \frac{\sigma_y^2}{\sigma_x^2}.
  4. Substitute the values: 0.80.45=σy29\frac{0.8}{0.45} = \frac{\sigma_y^2}{9}.
  5. σy2=0.8×90.45=7.20.45=16\sigma_y^2 = \frac{0.8 \times 9}{0.45} = \frac{7.2}{0.45} = 16.
  6. The variance of YY is 1616.

Explanation:

This solution utilizes the algebraic relationship between the two regression coefficients and the ratio of the standard deviations of the variables.