O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Print Subscribe to Databases Subscribe to Newsletters

ANOVA Statistical Programming with PHP
Pages: 1, 2, 3, 4

The bulk of the code involves calculating the value of various instance variables to use in subsequent reporting steps. Most of these instance variables are associative arrays with indices such as total, between, and within. This is because the ANOVA procedure involves computing the total variance (in our test scores) and partitioning it into between-group (i.e., between treatment levels) and within-group (i.e., within a treatment level) variance estimates.

At the end of the analyze method we evaluate the probability of the observed F score by first instantiating an FDistribution class with our degrees of freedom parameters:

$F = new FDistribution($this->df["between"], $this->df["within"]);

To obtain the probability of the obtained F score we subtract 1 minus the value returned by the cumulative distribution function applied to the obtained F score:

$this->p = 1 - $F->CDF($this->f);

Finally, we invoke the inverse cumulative distribution function using 1 minus our alpha setting (i.e., 1 - 0.05) in order set a critical F value that defines the decision criterion we will use to reject the null hypothesis which states that there is no difference between treatment-level means.

$this-crit = $F->inverseCDF(1 - $this->alpha);

If our observed F score is visibly greater than the critical F score, we can conclude that at least one of the means differs significantly from the others. A p value (i.e., $this->p) value less than 0.05 (or whatever your null rejection setting is) would also lead you to reject the null hypothesis.

The formula for decomposing the total sum of squares (first term) into a between-groups component (second term) and a within-group component (third term) appears in Figure 1.

\sum_{t=1}^{k}\sum_{i=1}^{n_t}(y_{ti} - \bar{y})^2 = \sum_{t=1}^{k}n_t(\bar{y}_t - \bar{y})^2 + \sum_{t=1}^{k}\sum_{i=1}^{n_t}(y_{ti} - \bar{y}_t)^2
Figure 1. Formula for decomposing the sum of squares.

The symbol y overbar stands for the grand mean and the symbol yt overbar stands for the treatment mean.

Step 2: Show Raw Data

It is always good to begin your analysis by making sure that you've properly loaded your data. We can call the showRawData() method to dump our test-anxiety data table to a web browser.

<?php
/*
* Output contents of database table.
*/  
function showRawData() {
  global $db;
  $data        = $db->tableInfo($this->table, DB_TABLEINFO_ORDER);
  $columns     = array_keys($data["order"]);
  $num_columns = count($columns);

  ?>

  <table cellspacing='0' cellpadding='0'>
    <tr>
      <td>
        <table border='1' cellspacing='0' cellpadding='3'>
        <?php
          print "<tr bgcolor='ffffcc'>";

          for ($i=0; $i < $num_columns; $i++) {
            print "<td align='center'><b>".$columns[$i]."</b></td>";
          }

          print "</tr>";

          $fields = implode(",", $columns); 
          $sql    = " SELECT $fields FROM $this->table ";
          $result = $db->query($sql);

          if (DB::isError($result)) {
            die( $result->getMessage());
          } else {
            while ($row = $result->fetchRow()) { 
              print "<tr>";

              foreach( $row as $key=>$value) {
                print "<td>$value</td>";
              }

              print "</tr>";
            }
          }
          ?>
        </table>
      </td>
    </tr>
  </table>
  <?php
}  
?>

This code generates as output the table below:

Table 1. Show Raw Data

idanxietyscore
1low26
2low34
3low46
4low48
5low42
6low49
7low74
8low61
9low51
10low53
11moderate51
12moderate50
13moderate33
14moderate28
15moderate47
16moderate50
17moderate48
18moderate60
19moderate71
20moderate42
21high52
22high64
23high39
24high54
25high58
26high53
27high77
28high56
29high63
30high59

A tip for data miners: Maybe you already have some data in your databases to which you can adapt this code. Look for situations where you have an enum data type to act as your treatment-level field and a corresponding integer or float column that measures some response associated with that treatment-level.

Pages: 1, 2, 3, 4

Next Pagearrow




Tagged Articles

Be the first to post this article to del.icio.us

Related to this Article

Data Jujitsu: The Art of Turning Data into Product Data Jujitsu: The Art of Turning Data into Product
November 2012
$0.00 USD

Designing Great Data Products Designing Great Data Products
March 2012
$0.00 USD

Sponsored Resources

  • Inside Lightroom
Advertisement
O'reilly

© 2013, O’Reilly Media, Inc.

(707) 827-7019 (800) 889-8969

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.

About O'Reilly

  • Academic Solutions
  • Jobs
  • Contacts
  • Corporate Information
  • Press Room
  • Privacy Policy
  • Terms of Service
  • Writing for O'Reilly

Community

  • Authors
  • Community & Featured Users
  • Forums
  • Membership
  • Newsletters
  • O'Reilly Answers
  • RSS Feeds
  • User Groups

Partner Sites

  • makezine.com
  • makerfaire.com
  • craftzine.com
  • igniteshow.com
  • PayPal Developer Zone
  • O'Reilly Insights on Forbes.com

Shop O'Reilly

  • Customer Service
  • Contact Us
  • Shipping Information
  • Ordering & Payment
  • The O'Reilly Guarantee