In essence, we now know that we want to break down the TOTAL variation in the data into two components:
- a component that is due to the TREATMENT (or FACTOR), and
- a component that is due to just RANDOM ERROR.
Let's see what kind of formulas we can come up with for quantifying these components. But first, as always, we need to define some notation. Let's represent our data, the group means, and the grand mean as follows:
1 | \(X_{11}\) | \(X_{12}\) | . . . | \(X_{1_{n_1}}\) | \(\bar{{X}}_{1.}\) |
2 | \(X_{21}\) | \(X_{22}\) | . . . | \(X_{2_{n_2}}\) | \(\bar{{X}}_{2.}\) |
. . . | . . . | . . . | . . . | . . . | . . . |
\(m\) | \(X_{m1}\) | \(X_{m2}\) | . . . | \(X_{m_{n_m}}\) | \(\bar{{X}}_{m.}\) |
Grand Mean | \(\bar{{X}}_{..}\) |
That is, we'll let:
- m denotes the number of groups being compared
- \(X_{ij}\) denote the \(j_{th}\) observation in the \(i_{th}\) group, where \(i = 1, 2, \dots , m\) and \(j = 1, 2, \dots, n_i\). The important thing to note here... note that j goes from 1 to \(n_i\), not to \(n\). That is, the number of the data points in a group depends on the group i. That means that the number of data points in each group need not be the same. We could have 5 measurements in one group, and 6 measurements in another.
- \(\bar{X}_{i.}=\dfrac{1}{n_i}\sum\limits_{j=1}^{n_i} X_{ij}\) denote the sample mean of the observed data for group i, where \(i = 1, 2, \dots , m\)
- \(\bar{X}_{..}=\dfrac{1}{n}\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n_i} X_{ij}\) denote the grand mean of all n data observed data points
Okay, with the notation now defined, let's first consider the total sum of squares, which we'll denote here as SS(TO). Because we want the total sum of squares to quantify the variation in the data regardless of its source, it makes sense that SS(TO) would be the sum of the squared distances of the observations \(X_{ij}\) to the grand mean \(\bar{X}_{..}\). That is:
\(SS(TO)=\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n_i} (X_{ij}-\bar{X}_{..})^2\)
With just a little bit of algebraic work, the total sum of squares can be alternatively calculated as:
\(SS(TO)=\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n_i} X^2_{ij}-n\bar{X}_{..}^2\)
Can you do the algebra?
Now, let's consider the treatment sum of squares, which we'll denote SS(T). Because we want the treatment sum of squares to quantify the variation between the treatment groups, it makes sense that SS(T) would be the sum of the squared distances of the treatment means \(\bar{X}_{i.}\) to the grand mean \(\bar{X}_{..}\). That is:
\(SS(T)=\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n_i} (\bar{X}_{i.}-\bar{X}_{..})^2\)
Again, with just a little bit of algebraic work, the treatment sum of squares can be alternatively calculated as:
\(SS(T)=\sum\limits_{i=1}^{m}n_i\bar{X}^2_{i.}-n\bar{X}_{..}^2\)
Can you do the algebra?
Finally, let's consider the error sum of squares, which we'll denote SS(E). Because we want the error sum of squares to quantify the variation in the data, not otherwise explained by the treatment, it makes sense that SS(E) would be the sum of the squared distances of the observations \(X_{ij}\) to the treatment means \(\bar{X}_{i.}\). That is:
\(SS(E)=\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n_i} (X_{ij}-\bar{X}_{i.})^2\)
As we'll see in just one short minute why the easiest way to calculate the error sum of squares is by subtracting the treatment sum of squares from the total sum of squares. That is:
\(SS(E)=SS(TO)-SS(T)\)
Okay, now, do you remember that part about wanting to break down the total variation SS(TO) into a component due to the treatment SS(T) and a component due to random error SS(E)? Well, some simple algebra leads us to this:
\(SS(TO)=SS(T)+SS(E)\)
and hence why the simple way of calculating the error of the sum of squares. At any rate, here's the simple algebra:
Proof
Well, okay, so the proof does involve a little trick of adding 0 in a special way to the total sum of squares:
\(SS(TO) = \sum\limits_{i=1}^{m} \sum\limits_{i=j}^{n_{i}}((X_{ij}-\color{red}\overbrace{\color{black}\bar{X}_{i_\cdot})+(\bar{X}_{i_\cdot}}^{\text{Add to 0}}\color{black}-\bar{X}_{..}))^{2}\)
Then, squaring the term in parentheses, as well as distributing the summation signs, we get:
\(SS(TO)=\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n_i} (X_{ij}-\bar{X}_{i.})^2+2\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n_i} (X_{ij}-\bar{X}_{i.})(\bar{X}_{i.}-\bar{X}_{..})+\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{n_i} (\bar{X}_{i.}-\bar{X}_{..})^2\)
Now, it's just a matter of recognizing each of the terms:
\(S S(T O)=
\color{red}\overbrace{\color{black}\sum\limits_{i=1}^{m} \sum\limits_{j=1}^{n_{i}}\left(X_{i j}-\bar{X}_{i \cdot}\right)^{2}}^{\text{SSE}}
\color{black}+2
\color{red}\overbrace{\color{black}\sum\limits_{i=1}^{m} \sum\limits_{j=1}^{n_{i}}\left(X_{i j}-\bar{X}_{i \cdot}\right)\left(\bar{X}_{i \cdot}-\bar{X}_{. .}\right)}^{\text{O}}
\color{black}+
\color{red}\overbrace{\color{black}\left(\sum\limits_{i=1}^{m} \sum\limits_{j=1}^{n_{i}}\left(\bar{X}_{i \cdot}-\bar{X}_{* . *}\right)^{2}\right.}^{\text{SST}}\)
That is, we've shown that:
\(SS(TO)=SS(T)+SS(E)\)
as was to be proved.