Cooper City High School
Statistics


Is the coefficient of Determination the same as the coefficient of Correlation squared?

...Yes,..it is. We will prove this now....Let's start !...

 

We define the coefficient of correlation ( r ) as :

          

  

(1)

Where:

Sx is the standard deviation for X, and Sy is the standard deviation for Y.

n is the number of events.

 

We can distribute the product (Xi - Xbar)(Yi-Ybar) = Xi Yi - Xi Ybar - Xbar Yi + XbarYbar   

So,  Σ[XiYi - Xi Ybar - Xbar Yi + XbarYbar] = ΣXiYi - NXbarYbar - NXbarYbar + NXbarYbar,

Simplifying ,  

(2)       Σ[XiYi - Xi Ybar - Xbar Yi + XbarYbar] = ΣXi Yi - NXbarYbar,

 

So, formula (1) can be rewritten as :

 

                        (3)  r = (1/(N-1))[ (ΣXiYi - NXbarYbar)/(Sx Sy)]

 

or,          

                        (3a)    [ΣXiYi - NXbarYbar] = (N-1)(Sx Sy)r

 

Now, forget for a moment about this result and let’s work with the Least-Square Regression Line(LSRL).

 

The condition to get the least-squares regression is that the sum (LS) should be a minimum:

                                    (4)   (LS) =  Σ (Yi (Yhat)i)2

 

where (Yhat)i is the value of Y we obtain using the LSRL expression:

 

                                    (5) (Yhat)i  = a + b Xi

 

The first condition is that the derivative of the function (LS), respect to the parameter a should be ZERO(0). From this condition, we get the result :

 

(6) Ybar =  a + bXbar    or equivalently,

 

(7)  a = Ybar - bXbar

 

The second condition is that the derivative of the function (SL), respect to the parameter b should be ZERO(0). From this condition, we get the result :

 

(8) ΣXiYi =  a N Xbar + b N (Xbar)2

 

Now, if we plug in the value of a given in the formula (7) into the formula (8), and solving for b, we get:

                       

(9) b = (ΣXiYi - NYbarXbar) / (N-1) Sx2

 

and, from here, we get :

 

(10)XiYi - NYbarXbar) = b(N-1) Sx2

     

Now, let’s go back to the formula (3a), using the formula (10):

 

(11)    (N-1)(Sx Sy)r = b(N-1) Sx2

   

and, if we simplify, and solve for r :

 

(12)    r  =  b Sx/Sy

 

Next, let’s work with the expression of (LS):

 

                                    (13) (LS) =  Σ (Yi (Yhat)i)2

 

Using (5), this formula, can be rewritten as :

 

                                    (14) (LS) =  Σ (Yi a + b Xi)2

 

if we distribute the product, and use the expression (7) for  a , we get the result:

 

                                    (15) (LS) = (N-1) Sy2b2 (N-1)Sx2 ,

 

Finally, let’s work with the definition of the coefficient of determination (R2).

 

        (16) R2 =  [ Σ (Yi-Ybar)2 - Σ (Yi (Yhat)i)2]/Σ (Yi-Ybar)2            ,

 

Because of the definition of the standard deviation,

 

                                    (17) Sy2 = Σ (Yi-Ybar)2 /(N-1),

 

so, we can transform (17) into:

 

                                    (18)            Σ (Yi-Ybar)2 =  (N-1) Sy2

 

Next, using the definition of (LS) (13), and the expression (15), my obtain:

 

             (19)   R2 =  {(N-1) Sy2 – [(N-1) Sy2 - b (N-1)Sx2 ] } / (N-1) Sy2 ,

 

Simplifying,

 

                                    (20) R2 =   b2 Sx2 / Sy2

 

But  the formula(20) is the same as the formula (12) squared, so r = R !! ….and that’s it!!!

BACK