O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Print Subscribe to Databases Subscribe to Newsletters

ANOVA Statistical Programming with PHP
Pages: 1, 2, 3, 4

Step 5: Show the Single-factor ANOVA Source Table

If we have carefully studied our box plots and descriptive statistics then the results of our formal analysis of whether a significant mean difference exists should come as no surprise. Invoking the showSourceTable() method generated the ANOVA source table below. It reports the amount of variance attributable to the effect of our treatment (see "Between" row) versus the amount of variance to chalk up to experimental error (see "Within" row).

Table 3. Show Source Table

ANOVA Source Table
SourceSSdfMSFp
Between577.402288.702.040.15
Within3812.9027141.22  
Critical F(0.05, 2, 27) is 3.35.

The obtained F value comes from dividing the mean square error attributable to the treatment ($ms["between"]) by the mean square error attributable to experimental error ($ms["within"]). If this ratio is sufficiently large then we can reject the null hypothesis that there is no treatment effect (i.e., H0: u1 = u2 = u3). In the example above, the probability p of the observed F value is 0.15 — higher than the conventional 0.05 cutoff for declaring statistical significant. Our critical Fcrit = 3.35 is also larger than the obtained F. Both of these facts tell us that we cannot reject the null hypothesis. This could be because there is in fact no effect of anxiety on test scores. A null finding could occur if we had a poor experimental design that had so much experimental error that it washed out our treatment effects.

Perhaps we need to use a repeated-measures design instead of an independent-samples design to try to remove some individual differences in responding to anxiety.

Step 6: Show Mean Differences

If our F test tells us that a significant treatment effect exists, then we would begin performing multiple comparisons among the treatment-level means to isolate the specific, significant-mean differences. Because our obtained F was not significant, there is no need to proceed to the multiple comparison stage. It is nevertheless worthwhile examining the size of our effects by calling the showMeanDifferences() method. This report arranges treatment-level means from lowest to highest and labels the rows and columns accordingly.

Table 4. Show Mean Differences

Mean Differences
 Moderate [48.00]Low [48.40]High [57.50]
Moderate 0.49.5
Low  9.1
High   

Many people engage in data mining without much theoretical motivation for making particular treatment comparisons. In such cases, I recommend obtaining a significant F before performing post-hoc comparisons. This approach helps keep the Type-I error rate low. When there is a theoretical motivation for believing that a significant difference exists between particular treatment means, then we can bypass the F test and immediately engage in a priori (or planned) approach, such as Dunn or Planned Orthogonal Contrasts methods. These methods can yield significant results (i.e., comparing the high-anxiety group to the combined low- and medium-anxiety groups) even when our F ratio is not significant. In general, there is no harm done in always checking whether the F test is significant before engaging in post-hoc or a priori multiple comparisons.

Concluding Remarks

In this article, we interpreted the single-factor ANOVA procedure broadly to consist of several steps to undertake when conducting a single-factor ANOVA analysis. We illustrated these steps using a hypothetical test-anxiety study in which we carried out six steps of the procedure in a recommended order — the order commonly prescribed in undergraduate statistics textbooks. We have not exhausted all the required steps in this article. Because our treatment effect was not significant, we did not proceed to further stages of the analysis procedure.

Had our result come out significant, we would have engaged in the multiple-comparison step where we statistically analyze (using multiple T tests) the significant particular-mean differences and contrasts. We would also have run several diagnostic tests to determine whether the data met the assumptions of our statistical tests. Finally, we would have begun to construct a theoretical model of how our treatment levels might exert a causal influence on our response variable. The work reported in this article starts us toward a full-featured single factor ANOVA package but there is more implementation work to do.

In addition to teaching the basics of the ANOVA procedure, another purpose of this article was to demonstrate that PHP is viable platform for web-based statistical computing, especially when combined with MySQL and JpGraph. The code distribution for this article contains a benchmark.php script that you can use to verify that the critical analyze() method is very fast — easily crunching 100,000 records in under a half second (0.448 seconds) on a very modest hardware (1000 MHz processor with 256 RAM). A recent American Statistician article repeated the standard advice that new statistics graduates should be proficient in C-based languages, Fortran, and Java. You might add PHP to this list, especially if you intend to work in a Web medium and see a need for online, database-driven analysis packages.

Resources

  • Gene V. Glass & Kenneth D. Hopkins (1995) Statistical Methods in Education and Psychology. (3rd Edition). Pearson Higher Education.
  • George E. P. Box, William G. Hunter, J. Stuart Hunter (1978) Statistics for Experimenters: An Introduction to Design, Data Analysis, and Model Building. John Wiley & Sons, Inc.
  • Paul DuBois (2002) MySQL Cookbook. O'Reilly.
  • David S. Moore & George P. McCabe (2003) Introduction to the Practice of Statistics. (4th Edition). W.H. Freeman and Company.
  • See The American Statistician (Feb 2004) for articles on computational statistics training.
  • Richard Lowry has an excellent online ANOVA tutorial in VassarStats.
  • Paul Meagher (Oct 2003) Apply probability models to Web data using PHP, IBM developerWorks.
  • Download the source code for this article.
  • Updates to the SFA and PDL packages will be made available at www.phpmath.com.

Acknowledgements

  • Thanks to Don Lawson PhD for reviewing the article and helping me to write the post-hoc vs. a priori comparisons discussion.
  • Thanks to Mark Hale, JSci Project admin, for helping me to port the FDistribution class from Java to PHP.
  • Thanks also to Jaco van Kooten and Mark Hale for their excellent work on the JSci probability distributions classes.

Paul Meagher is a cognitive scientist whose graduate studies focused on mathematical problem solving.


Return to the PHP DevCenter.


Comments on this article

1 to 1 of 1
1 to 1 of 1


Sponsored Resources

  • Inside Lightroom
Advertisement
O'reilly

© 2018, O’Reilly Media, Inc.

(707) 827-7019 (800) 889-8969

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.

About O'Reilly

  • Sign In
  • Academic Solutions
  • Jobs
  • Contacts
  • Corporate Information
  • Press Room
  • Privacy Policy
  • Terms of Service
  • Writing for O'Reilly

Community

  • Authors
  • Community & Featured Users
  • Forums
  • Membership
  • Newsletters
  • O'Reilly Answers
  • RSS Feeds
  • User Groups

Partner Sites

  • makezine.com
  • makerfaire.com
  • craftzine.com
  • igniteshow.com
  • PayPal Developer Zone
  • O'Reilly Insights on Forbes.com

Shop O'Reilly

  • Customer Service
  • Contact Us
  • Shipping Information
  • Ordering & Payment
  • The O'Reilly Guarantee