χ2   Distribution

 

The χ2   Distribution is one of the distributions we will be using during the course. Unlike the normal and t-distributions that are symmetric, the χ2-distribution is skewed to the right. Like the t-distribution, the χ2-distribution consists of a whole family of distributions distinguished by a single whole number parameter, ν , called the number of degrees of freedom. This value of ν determines the skewness of the graph.  

We will use the χ2   Distribution in three applications:

        (a) Estimating a Population Variance
        (b) Performing a Goodness-of-fit Test
        (c) Contingency Tables

In all three applications, we will be looking for the value of the test statistic χ2 ....What is this  χ2 statistic?
Think about this experiment: You toss a coin 100 times. Of course, we can simulate this experiment using the TI calculator with the function randInt(1,2,100)-> L1. Then, we can sort the data and count the number of ones( tails) (or twos (heads)) we got. I did the experiment and I got 54 ones ( 46 twos). If we perform the experiment several times, we can get different values, or some of them could be repeated. We will call these , the observed (O) values. Before performing the experiment, we expected to get 50-50, if the coin is fair. We will call these, the expected (E) values. We call these the expected values because if we perform this experiment many many times, we expect to get equal number of tails and heads. This conviction is based on the fact that the probability of getting a tail or a head is 50-50%( if the coin is fair!). So, if we perform this experiment many many times, we will not be surprised of getting an average of 50 tails ( or 50 heads!). Now, look at the number defined as:   [ (Oheads-E)2 + (Otails - E)2] / E . If we use the values we got before (54,46), then this number will be equal  [ (46-50)2 + (54-50)2] / 50 = 0.64. This number is what we call the χ2 statistic. Observe that this number has to be a positive number. The amazing thing is that if we perform this experiment many many times, the distribution of the χ2 statistic values is not arbitrary, but follows a distribution called the χ2 distribution. The expression of the χ2 function is:

                                            

as you can see, the function depends on an additional parameter, ν, the degrees of freedom. 
To check how accurate are the predictions using this formula, I have performed three experiments.

 In the first one, I have simulated tossing a coin 500 times, and then I repeated the experiment 500 times. I have compared the values obtained in the experiments(O), with the values calculated(E) using the above formula for the intervals, 0-1, 1-2, 2-3, ....12-13. In this case we have 1 degree of freedom (ν=1). (At the end, you can find the program I wrote to make the simulation.)

Interval Expected(E) Observed(O)
0-1 343 344
1-2 80 68
2-3 37 45
3-4 18 24
4-5 10 10
5-6 6 5
6-7 3 1
7-8 2 2
8-9 1 0
9-10 <1 0
10-11 <1 0
11-12 <1 0
12-13 <1 1

 

  In the second experiment, I have simulated tossing a die 500 times, and then I have repeated the experiment 500 times. In this case the number the degrees of freedom is ν=5, and the value of χ is given by the expression:

                  χ = [(n1-m)2 + (n2-m)2 +(n3-m)2 +(n4-m)2 +(n5-m)2 +(n6-m)2 ] / m

where, ni is the number of times we observed the number i, and m is the expected value which is the total number of trials divided by 6. The results are in the next table:

Interval Expected(E) Observed(O)
0-1 19 10
1-2 57 64
2-3 75 78
3-4 75 65
4-5 67 59
5-6 55 60
6-7 43 45
7-8 32 33
8-9 24 30
9-10 17 22
10-11 12 11
11-12 8 12
12-13 6 4
13-14 4 2
14-15 3 1
15-16 2 1
16-17 1 1
17-18 <1 0
18-19 <1 2
 
 

In the third experiment, I have simulated tossing a soccerball-like die( 12 faces!) 500 times, and then I have repeated the experiment 500 times. In this case the number the degrees of freedom is ν=11, and the value of χ is given by the expression:

                

where, mi is the number of times we observed the number i, and m is the expected value which is the total number of trials divided by 12. The results are in the next table:

Interval Expected Observed
0-1 <1 0
1-2 <1 1
2-3 4 1
3-4 10 12
4-5 19 19
5-6 29 22
6-7 37 43
7-8 43 36
8-9 46 58
9-10 46 49
10-11 44 41
11-12 40 38
12-13 35 47
13-14 30 27
14-15 25 27
15-16 21 24
16-17 17 8
17-18 13 7
18-19 10 13
19-20 8 9
20-21 6 3
21-22 4 3
22-23 3 3
23-24 2 1
24-25 2 0
25-26 1 0
26-27 1 2
27-28 1 0
28-29 <1 1
29-30 <1 1


I find all of this really amazing!. You see that there is some order, some logic behind all of these statistical fluctuations! Why?...I don't know.  If you get a result like this in Physics, you say:...there is some law of conservation behind these numbers!...but, what we have here?...We are talking about coins and dice!.

 

Programs for simulation:

Program for Simulation in TI-89:

Coins()
Prgm
Disp "Start"
Disp "n="
Input n
Disp "m="
Input m
clrlist(l1,l2,l3)
For i,1,n
randInt(1,2,m)-> l1
SortA  l1
0 -> a
For j,1,m
If l1[j] > 1 Then
a+1 -> a
EndIf
EndFor
a -> l2[i]
EndFor
For k,1,n
l2[k] -> u
(m-l2[k]) -> v
((u - m/2)^2 + (v - m/2)^2)/(m/2) -> l3[k]
EndFor
Disp "End"
EndPrgm

 

Program for Simulation in TI-89:

Dice()
Prgm
Disp "Start"
Disp "Number of Throws="
Input n
Disp "Number of Trials="
Input t
clrlist(l1,l2)
For i,1,t
randInt(1,6,n)-> l1
SortA  l1
0 -> m1
0 -> m2
0 -> m3
0 -> m4
0 -> m5
0 -> m6
For j,1,n
If l1[j] = 1 Then
m1+1 -> m1
EndIf
EndFor
If l1[j] = 2 Then
m2+1 -> m2
EndIf
EndForIf l1[j] = 3 Then
m3+1 -> m3
EndIf
EndForIf l1[j] = 4 Then
m4+1 -> m4
EndIf
EndForIf l1[j] = 5 Then
m5+1 -> m5
EndIf
EndForIf l1[j] = 6 Then
m6+1 -> m6
EndIf
EndFor
n/6 -> n6
((m1-n6)^2 +(m2-n6)^2 +(m3-n6)^2 +(m4-n6)^2 +(m5-n6)^2 +(m6-n6)^2)/n6 -> l2[i]
EndFor
Disp "End"
EndPrgm

 

 

BACK