AddThis Social Bookmark Button

Listen Print Discuss

ANOVA Statistical Programming with PHP

by Paul Meagher
07/22/2004

The Analysis of Variance (ANOVA) technique is the most popular statistical technique in behavioral research. The ANOVA technique also comes up often in agricultural, pharmaceutical, and quality control contexts. This article will introduce you to six major steps involved in using this technique by implementing them with a combination of PHP, MySQL, and JpGraph. The result is code and knowledge that you can use to think about and potentially solve your own data-mining problems.

The scope of the ANOVA technique can narrow to include only the formal mathematics required to partition the total variance of a data matrix into between-group and within-group variance estimates. Within this narrow construal, one might also include the machinery to test whether the ratio of the between to within-group variance is significant (i.e., whether there is a treatment effect).

In this article, we will construe the ANOVA technique more broadly to consist of multiple data-analysis steps to take when conducting an ANOVA analysis. The ANOVA technique here is a methodical approach to analyzing data that issues from a particular type of data-generating process. The data-generating process will ideally arise from a blocked and randomized experimental design.

Test Anxiety Study

Related Reading

Learning PHP 5

Learning PHP 5
By David Sklar

Table of Contents
Index
Sample Chapter

Read Online--Safari Search this book on Safari:
 

Code Fragments only

The prototypical Single Factor ANOVA experiment is a simple comparative experiment. These are experiments that involve applying a treatment to homogeneous experimental units and measuring the responses that occur when administering different levels of the treatment to these experimental units.

The hypothetical study we will discuss in this article examines the effect of anxiety (i.e., the treatment) on test performance (i.e., the response). The study randomly assigned 30 subjects to a low-anxiety, moderate-anxiety, or high-anxiety treatment condition. The experimenter recorded a test score measurement for each subject. The empirical issue of concern is whether there is an effect of Anxiety Level on Test Score.

The idea for this hypothetical study and the data to analyze originally appeared in the popular textbook by Gene V. Glass & Kenneth D. Hopkins (1995) Statistical Methods in Education and Psychology. The reported results agree with their results. You will also find it useful to consult this textbook for its excellent and comprehensive treatment of the ANOVA technique.

Analysis Source Code

You can use the single factor ANOVA technique to determine whether anxiety significantly influences test scores. The following PHP script implements six major steps in the single-factor ANOVA technique used to analyze data from our hypothetical test anxiety study. After you have examined the overall flow of the script, proceed to the rest of the article where we examine the tabular and graphical output that each step in this script generates.

<?php
/**
* @package SFA
*
* Script performs single factor ANOVA analysis on test anxiety
* data stored in a database.
*
* @author Paul Meagher, Datavore Productions
* @license PHP v3.0
* @version 0.7
*
* The config.php file defines paths to the root of the PHPMATH
* and JPGRAPH libraries and sets up a global database connection.
*/

require_once "config.php";
require_once PHPMATH ."/SFA/SingleFactorANOVA.php";

$sfa = new SingleFactorANOVA;

// Step 1: Specify and analyze data
$sfa->setTable("TestScores");
$sfa->setTreatment("anxiety");
$sfa->setResponse("score");
$sfa->analyze();

// Step 2: Show raw data
$sfa->showRawData();

// Step 3: Show box plot
$params["figureTitle"] = "Anxiety Study";
$params["xTitle"]      = "Anxiety Level";
$params["yTitle"]      = "Test Score";
$params["yMin"]        = 0;
$params["yMax"]        = 100;
$params["yTicks"]      = 10;

$sfa->showBoxPlot($params);

// Step 4: Show descriptive statistics
$sfa->showDescriptiveStatistics();

// Step 5: Show single factor ANOVA source table
$sfa->showSourceTable();

// Step 6: Show mean differences.
$sfa->showMeanDifferences();

?>

Step 1: Specify and Analyze Data

After we instantiate the SingleFactorAnova class, we start by specifying 1) what data table to use, 2) what table field name to use as the treatment column, and 3) what table field name to use as the response column:

<?php
// Step 1: Specify and analyze data
$sfa->setTable("TestScores");
$sfa->setTreatment("anxiety");
$sfa->setResponse("score");
$sfa->analyze();
?>

The culmination of the first step is the invocation of the $this->analyze() method. This method is the centerpiece of this SingleFactorANOVA class and is reproduced below. Note that I am using the PEAR:DB API to interact with a MySQL database.

<?php
/**
* Compute single factor ANOVA statistics.
*/
function analyze() {
  global $db;
  $sql  = " SELECT $this->treatment, sum($this->response), ";
  $sql .= " sum($this->response * $this->response), "
  $sql .= " count($this->response) ";
  $sql .= " FROM $this->table ";
  $sql .= " GROUP BY $this->treatment ";

  $result = $db->query($sql);

  if (DB::isError($result)) {
    die($result->getMessage());
  } else {
    while ($row = $result->fetchRow()) {
      $level                  = $row[0];

      $this->levels[]         = $row[0];
      $this->sums[$level]     = $row[1];
      $this->n[$level]        = $row[3];        

      $this->means[$level]    = 
          $this->sums[$level] / $this->n[$level];
      $this->ss[$level]       = 
          $row[2] - $this->n[$level] * pow($this->means[$level], 2);
      $this->variance[$level] = 
          $this->ss[$level] / ($this->n[$level] - 1);         
    }    

    $this->sums["total"]  = array_sum($this->sums);
    $this->n["total"]     = array_sum($this->n);
    $this->means["grand"] = $this->sums["total"] / $this->n["total"];
    $this->ss["within"]   = array_sum($this->ss);   

    foreach($this->levels as $level) {
      $this->effects[$level] = 
          $this->means[$level] - $this->means["grand"];
      $this->ss["between"]  += 
          $this->n[$level] * pow($this->effects[$level], 2);
    }

    $this->num_levels = count($this->levels);

    $this->df["between"] = $this->num_levels - 1;
    $this->df["within"]  = $this->n["total"] - $this->num_levels;  
    $this->ms["between"] = $this->ss["between"] / $this->df["between"];
    $this->ms["within"]  = $this->ss["within"] / $this->df["within"];

    $this->f    = $this->ms["between"] / $this->ms["within"];
    $F          = new FDistribution($this->df["between"], 
                                    $this->df["within"]);
    $this->p    = 1 - $F->CDF($this->f);
    $this->crit = $F->inverseCDF(1 - $this->alpha);
  }            
}
?>

We could have passed a data matrix into the analysis method. Instead I assume that the data resides in a database and use SQL to extract and sort the records that feed into the subsequent analysis code. I made this storage assumption because of my interest in developing a scalable data-analysis solution.

Pages: 1, 2, 3, 4

Next Pagearrow




-->