oreilly.comSafari Books Online.Conferences.


Analyzing Statistics with GNU R
Pages: 1, 2, 3, 4, 5

Finally, the command summary(myFm) produces a text summary of the linear regression analysis performed by lm():

lm(formula = y ~ x, data = myDf)

     Min       1Q   Median       3Q      Max
-6.92396 -0.59826  0.01474  0.59522  5.64824

            Estimate Std. Error t value Pr(>|t|)
(Intercept)  0.04402    0.02185   2.015    0.044 *
x           -0.01498    0.01939  -0.772    0.440
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.127 on 2660 degrees of freedom
Multiple R-Squared: 0.0002243,  Adjusted R-squared: -0.0001516
F-statistic: 0.5967 on 1 and 2660 DF,  p-value: 0.4399

You probably need a background in statistics to interpret all of this accurately, but looking at the graph, there does not appear to be a strong correlation between the S&P 500's yesterday and today change. What the market did yesterday doesn't seem to strongly affect what happens today--though the correlation is nonzero. The negative slope of the best fit line suggests that the market had a slight tendency to reverse, or correct, a portion of the previous day's movement, over the 10-year data period.

3-D Data Demonstration: Mapping the U.S. Housing "Bubble"

R provides several options for graphical presentation of three-dimensional data, including 3-D perspective plots, color-coded images, and contour maps. To demonstrate R's 3-D capabilities, I downloaded United States house price index data published by the Office of Federal Housing Enterprise Oversight (OFHEO). The data set used for the analysis is the percent that house prices have increased over the past five years in the nine regions the OFHEO defines for the United States. The OFHEO presents its data in a table. However, a table doesn't clearly depict the geographical distribution of the "housing bubble."

To produce a 3-D representation of the five-year house price index data, I created a 51-by-30-element grid that approximates the geographical size and position of the nine OFHEO regions. Then, using a Perl script, I created an input data set for R that assigns the price index for the region to each grid point. Hence, the R input file consists of 51 x 30 = 1,530 data points. For areas on the grid that are not part of the United States (for example, the ocean), the data values are NA, which tells R not to display data for that position in the 3-D plot.

The R commands that produce an image representation of the house price data are:

> dv <- read.table("./ofheo5yr.gridFine", header=FALSE)
> z <- dv[,1]
> attr(z, "dim") = c(51,30)
> image(z, col=topo.colors(50), axes=FALSE)

The first line reads the 1,530-point data file. The second line assigns the values read to variable z. Next, the attr() function alters the dimension ("dim") attribute of the z variable to arrange its data in the form of a two-dimensional array with 51 rows, each containing 30 data values. Finally, the image() function generates an image of the z data, using 50 colors in the topo color scale to represent the varying z values. Figure 8 shows the results.

5-year regional house price changes image plot
Figure 8. The plot of five-year regional house price changes

The topo color scale assigns violet to low values, with blue, green, yellow, orange, and pink assigned to successively higher values. The image shows that the change in housing prices is not at all evenly distributed across the United States geographically.

The persp() function provides another view of the same data:

> persp(z,theta=0,phi=60,box=FALSE,col="yellow")

This code tells R to generate a 3-D perspective image of the z data, viewing the image with no rotation (theta=0) and tilted 60 degrees from the horizontal (phi=60), with no box around the image, coloring the mesh yellow. Figure 9 shows the plot.

Five-year regional house price changes perspective plot
Figure 9. The perspective plot of five-year regional house price changes

Perhaps more clearly, the perspective plot shows that the large increases in house prices in the past five years have occurred on the West Coast and in the Northeast, while the middle of the United States hasn't seen significant home price increases overall. This backs up Alan Greenspan's assertions that there is not a national housing "bubble," but there appear to be "signs of froth in some local markets."

Pages: 1, 2, 3, 4, 5

Next Pagearrow

Sponsored by: