O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Print Subscribe to Databases Subscribe to Newsletters

ANOVA Statistical Programming with PHP

by Paul Meagher
07/22/2004

The Analysis of Variance (ANOVA) technique is the most popular statistical technique in behavioral research. The ANOVA technique also comes up often in agricultural, pharmaceutical, and quality control contexts. This article will introduce you to six major steps involved in using this technique by implementing them with a combination of PHP, MySQL, and JpGraph. The result is code and knowledge that you can use to think about and potentially solve your own data-mining problems.

The scope of the ANOVA technique can narrow to include only the formal mathematics required to partition the total variance of a data matrix into between-group and within-group variance estimates. Within this narrow construal, one might also include the machinery to test whether the ratio of the between to within-group variance is significant (i.e., whether there is a treatment effect).

In this article, we will construe the ANOVA technique more broadly to consist of multiple data-analysis steps to take when conducting an ANOVA analysis. The ANOVA technique here is a methodical approach to analyzing data that issues from a particular type of data-generating process. The data-generating process will ideally arise from a blocked and randomized experimental design.

Test Anxiety Study

Related Reading

Learning PHP 5
By David Sklar

The prototypical Single Factor ANOVA experiment is a simple comparative experiment. These are experiments that involve applying a treatment to homogeneous experimental units and measuring the responses that occur when administering different levels of the treatment to these experimental units.

The hypothetical study we will discuss in this article examines the effect of anxiety (i.e., the treatment) on test performance (i.e., the response). The study randomly assigned 30 subjects to a low-anxiety, moderate-anxiety, or high-anxiety treatment condition. The experimenter recorded a test score measurement for each subject. The empirical issue of concern is whether there is an effect of Anxiety Level on Test Score.

The idea for this hypothetical study and the data to analyze originally appeared in the popular textbook by Gene V. Glass & Kenneth D. Hopkins (1995) Statistical Methods in Education and Psychology. The reported results agree with their results. You will also find it useful to consult this textbook for its excellent and comprehensive treatment of the ANOVA technique.

Analysis Source Code

You can use the single factor ANOVA technique to determine whether anxiety significantly influences test scores. The following PHP script implements six major steps in the single-factor ANOVA technique used to analyze data from our hypothetical test anxiety study. After you have examined the overall flow of the script, proceed to the rest of the article where we examine the tabular and graphical output that each step in this script generates.

<?php
/**
* @package SFA
*
* Script performs single factor ANOVA analysis on test anxiety
* data stored in a database.
*
* @author Paul Meagher, Datavore Productions
* @license PHP v3.0
* @version 0.7
*
* The config.php file defines paths to the root of the PHPMATH
* and JPGRAPH libraries and sets up a global database connection.
*/

require_once "config.php";
require_once PHPMATH ."/SFA/SingleFactorANOVA.php";

$sfa = new SingleFactorANOVA;

// Step 1: Specify and analyze data
$sfa->setTable("TestScores");
$sfa->setTreatment("anxiety");
$sfa->setResponse("score");
$sfa->analyze();

// Step 2: Show raw data
$sfa->showRawData();

// Step 3: Show box plot
$params["figureTitle"] = "Anxiety Study";
$params["xTitle"]      = "Anxiety Level";
$params["yTitle"]      = "Test Score";
$params["yMin"]        = 0;
$params["yMax"]        = 100;
$params["yTicks"]      = 10;

$sfa->showBoxPlot($params);

// Step 4: Show descriptive statistics
$sfa->showDescriptiveStatistics();

// Step 5: Show single factor ANOVA source table
$sfa->showSourceTable();

// Step 6: Show mean differences.
$sfa->showMeanDifferences();

?>

Step 1: Specify and Analyze Data

After we instantiate the SingleFactorAnova class, we start by specifying 1) what data table to use, 2) what table field name to use as the treatment column, and 3) what table field name to use as the response column:

<?php
// Step 1: Specify and analyze data
$sfa->setTable("TestScores");
$sfa->setTreatment("anxiety");
$sfa->setResponse("score");
$sfa->analyze();
?>

The culmination of the first step is the invocation of the $this->analyze() method. This method is the centerpiece of this SingleFactorANOVA class and is reproduced below. Note that I am using the PEAR:DB API to interact with a MySQL database.

<?php
/**
* Compute single factor ANOVA statistics.
*/
function analyze() {
  global $db;
  $sql  = " SELECT $this->treatment, sum($this->response), ";
  $sql .= " sum($this->response * $this->response), "
  $sql .= " count($this->response) ";
  $sql .= " FROM $this->table ";
  $sql .= " GROUP BY $this->treatment ";

  $result = $db->query($sql);

  if (DB::isError($result)) {
    die($result->getMessage());
  } else {
    while ($row = $result->fetchRow()) {
      $level                  = $row[0];

      $this->levels[]         = $row[0];
      $this->sums[$level]     = $row[1];
      $this->n[$level]        = $row[3];        

      $this->means[$level]    = 
          $this->sums[$level] / $this->n[$level];
      $this->ss[$level]       = 
          $row[2] - $this->n[$level] * pow($this->means[$level], 2);
      $this->variance[$level] = 
          $this->ss[$level] / ($this->n[$level] - 1);         
    }    

    $this->sums["total"]  = array_sum($this->sums);
    $this->n["total"]     = array_sum($this->n);
    $this->means["grand"] = $this->sums["total"] / $this->n["total"];
    $this->ss["within"]   = array_sum($this->ss);   

    foreach($this->levels as $level) {
      $this->effects[$level] = 
          $this->means[$level] - $this->means["grand"];
      $this->ss["between"]  += 
          $this->n[$level] * pow($this->effects[$level], 2);
    }

    $this->num_levels = count($this->levels);

    $this->df["between"] = $this->num_levels - 1;
    $this->df["within"]  = $this->n["total"] - $this->num_levels;  
    $this->ms["between"] = $this->ss["between"] / $this->df["between"];
    $this->ms["within"]  = $this->ss["within"] / $this->df["within"];

    $this->f    = $this->ms["between"] / $this->ms["within"];
    $F          = new FDistribution($this->df["between"], 
                                    $this->df["within"]);
    $this->p    = 1 - $F->CDF($this->f);
    $this->crit = $F->inverseCDF(1 - $this->alpha);
  }            
}
?>

We could have passed a data matrix into the analysis method. Instead I assume that the data resides in a database and use SQL to extract and sort the records that feed into the subsequent analysis code. I made this storage assumption because of my interest in developing a scalable data-analysis solution.

Pages: 1, 2, 3, 4

Next Pagearrow




Tagged Articles

Be the first to post this article to del.icio.us

Related to this Article

Data Jujitsu: The Art of Turning Data into Product Data Jujitsu: The Art of Turning Data into Product
November 2012
$0.00 USD

Designing Great Data Products Designing Great Data Products
March 2012
$0.00 USD

Sponsored Resources

  • Inside Lightroom
Advertisement
O'reilly

© 2013, O’Reilly Media, Inc.

(707) 827-7019 (800) 889-8969

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.

About O'Reilly

  • Academic Solutions
  • Jobs
  • Contacts
  • Corporate Information
  • Press Room
  • Privacy Policy
  • Terms of Service
  • Writing for O'Reilly

Community

  • Authors
  • Community & Featured Users
  • Forums
  • Membership
  • Newsletters
  • O'Reilly Answers
  • RSS Feeds
  • User Groups

Partner Sites

  • makezine.com
  • makerfaire.com
  • craftzine.com
  • igniteshow.com
  • PayPal Developer Zone
  • O'Reilly Insights on Forbes.com

Shop O'Reilly

  • Customer Service
  • Contact Us
  • Shipping Information
  • Ordering & Payment
  • The O'Reilly Guarantee