**ANOVA Statistical Programming with PHP**

Pages: 1, **2**, 3, 4

The bulk of the code involves calculating the value of various instance
variables to use in subsequent reporting steps. Most of these instance
variables are associative arrays with indices such as `total`

,
`between`

, and `within`

. This is because the ANOVA
procedure involves computing the total variance (in our test scores) and
partitioning it into between-group (i.e., between treatment levels) and
within-group (i.e., within a treatment level) variance estimates.

At the end of the `analyze`

method we evaluate the probability of
the observed F score by first instantiating an FDistribution class with our
degrees of freedom parameters:

`$F = new FDistribution($this->df["between"], $this->df["within"]);`

To obtain the probability of the obtained F score we subtract 1 minus the value returned by the cumulative distribution function applied to the obtained F score:

`$this->p = 1 - $F->CDF($this->f);`

Finally, we invoke the inverse cumulative distribution function using 1
minus our alpha setting (i.e., 1 - 0.05) in order set a critical F value that
defines the decision criterion we will use to reject the *null
hypothesis* which states that *there is no difference between
treatment-level means*.

`$this-crit = $F->inverseCDF(1 - $this->alpha);`

If our observed F score is visibly *greater than* the critical F
score, we can conclude that at least one of the means differs significantly
from the others. A p value (i.e., `$this->p`

) value less than
0.05 (or whatever your null rejection setting is) would also lead you to reject
the null hypothesis.

The formula for decomposing the total sum of squares (first term) into a between-groups component (second term) and a within-group component (third term) appears in Figure 1.

*Figure 1. Formula for decomposing the sum of squares.*

The symbol stands for the grand mean and the symbol stands for the treatment mean.

### Step 2: Show Raw Data

It is always good to begin your analysis by making sure that you've properly
loaded your data. We can call the `showRawData()`

method to dump our
test-anxiety data table to a web browser.

```
<?php
/*
* Output contents of database table.
*/
function showRawData() {
global $db;
$data = $db->tableInfo($this->table, DB_TABLEINFO_ORDER);
$columns = array_keys($data["order"]);
$num_columns = count($columns);
?>
<table cellspacing='0' cellpadding='0'>
<tr>
<td>
<table border='1' cellspacing='0' cellpadding='3'>
<?php
print "<tr bgcolor='ffffcc'>";
for ($i=0; $i < $num_columns; $i++) {
print "<td align='center'><b>".$columns[$i]."</b></td>";
}
print "</tr>";
$fields = implode(",", $columns);
$sql = " SELECT $fields FROM $this->table ";
$result = $db->query($sql);
if (DB::isError($result)) {
die( $result->getMessage());
} else {
while ($row = $result->fetchRow()) {
print "<tr>";
foreach( $row as $key=>$value) {
print "<td>$value</td>";
}
print "</tr>";
}
}
?>
</table>
</td>
</tr>
</table>
<?php
}
?>
```

This code generates as output the table below:

*Table 1. Show Raw Data*

id | anxiety | score |
---|---|---|

1 | low | 26 |

2 | low | 34 |

3 | low | 46 |

4 | low | 48 |

5 | low | 42 |

6 | low | 49 |

7 | low | 74 |

8 | low | 61 |

9 | low | 51 |

10 | low | 53 |

11 | moderate | 51 |

12 | moderate | 50 |

13 | moderate | 33 |

14 | moderate | 28 |

15 | moderate | 47 |

16 | moderate | 50 |

17 | moderate | 48 |

18 | moderate | 60 |

19 | moderate | 71 |

20 | moderate | 42 |

21 | high | 52 |

22 | high | 64 |

23 | high | 39 |

24 | high | 54 |

25 | high | 58 |

26 | high | 53 |

27 | high | 77 |

28 | high | 56 |

29 | high | 63 |

30 | high | 59 |

A tip for data miners: Maybe you already have some data in your databases to
which you can adapt this code. Look for situations where you have an
`enum`

data type to act as your treatment-level field and a
corresponding integer or float column that measures some response associated
with that treatment-level.