Cooper City High School
Statistics
Is the coefficient of Determination the same as the coefficient of Correlation squared?
...Yes,..it is. We will prove this now....Let's start !...
We define the coefficient of correlation ( r ) as :
|
(1) |
|
Where:
Sx is the standard deviation for X, and Sy is the standard deviation for Y.
n is the number of events.
We can distribute the product (Xi - Xbar)(Yi-Ybar) = Xi Yi - Xi Ybar - Xbar Yi + XbarYbar
So, Σ[XiYi
- Xi Ybar
- Xbar
Yi + XbarYbar] =
ΣXiYi
- NXbarYbar - NXbarYbar + NXbarYbar,
Simplifying ,
(2)
Σ[XiYi
- Xi Ybar
- Xbar
Yi + XbarYbar] =
ΣXi
Yi - NXbarYbar,
So, formula (1) can be rewritten as :
(3) r = (1/(N-1))[ (ΣXiYi - NXbarYbar)/(Sx Sy)]
or,
(3a) [ΣXiYi - NXbarYbar] = (N-1)(Sx Sy)r
Now, forget for a moment about this result and let’s work
with the Least-Square Regression Line(LSRL).
The condition to get the least-squares regression is that the sum (LS) should be a minimum:
(4) (LS) = Σ (Yi – (Yhat)i)2
where (Yhat)i is the value of Y we obtain using the LSRL expression:
(5) (Yhat)i = a + b Xi
The first condition is that the derivative of the function (LS), respect to the parameter a should be ZERO(0). From this condition, we get the result :
(6) Ybar = a + bXbar or equivalently,
(7) a =
Ybar - bXbar
The second condition is that the derivative of the function (SL), respect to the parameter b should be ZERO(0). From this condition, we get the result :
(8) ΣXiYi = a N Xbar + b N (Xbar)2
Now, if we plug in the value of a given in the formula (7) into the formula (8), and solving for b, we get:
(9) b = (ΣXiYi - NYbarXbar) / (N-1) Sx2
and, from here, we get :
(10) (ΣXiYi - NYbarXbar) = b(N-1) Sx2
Now, let’s go back to the formula (3a), using the formula (10):
(11)
(N-1)(Sx Sy)r
= b(N-1)
Sx2
and, if we simplify, and solve for r :
(12) r = b Sx/Sy
Next, let’s work with the expression of (LS):
(13) (LS) = Σ (Yi – (Yhat)i)2
Using (5), this formula, can be rewritten as :
(14) (LS) = Σ (Yi – a + b Xi)2
if we distribute the product, and use the expression (7) for a , we get the result:
(15) (LS) = (N-1) Sy2 – b2 (N-1)Sx2 ,
Finally, let’s work with the definition of the coefficient of determination (R2).
(16) R2 = [ Σ (Yi-Ybar)2 - Σ (Yi – (Yhat)i)2]/Σ (Yi-Ybar)2 ,
Because of the definition of the standard deviation,
(17) Sy2 = Σ (Yi-Ybar)2 /(N-1),
so, we can transform (17) into:
(18) Σ (Yi-Ybar)2 = (N-1) Sy2
Next, using the definition of (LS) (13), and the expression (15), my obtain:
(19) R2 = {(N-1) Sy2 – [(N-1) Sy2 - b (N-1)Sx2 ] } / (N-1) Sy2 ,
Simplifying,
(20) R2 = b2 Sx2 / Sy2
