ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Analyzing Statistics with GNU R
Pages: 1, 2, 3, 4, 5

Histograms

Histograms are plots that show the distribution of a set of values. It's easy to use R to look at the distribution of daily gains (and losses) for the S&P 500 in the sample data set.



First, you must calculate the daily percent change for each day. You can't do this for the first day in the data set, so the size of the percent gains array will be one unit less than the size of the sp500value array. Here's one method for directing R to create the daily percent gain array, based on knowledge (gained using a Unix wc command) that the data set consists of 2,664 total points:

> yesterday = sp500value[1:2663]
> today = sp500value[2:2664]
> changePercent = 100 * (today / yesterday - 1.0)

Here, the new variable yesterday is the set of S&P 500 values (excluding the last day), and the new variable today is the set of S&P 500 values offset by one day (excluding the first day). Hence, the two variables are aligned such that yesterday[i] and today[i] represent yesterday's and today's S&P 500 price. This allows application of an equation using yesterday's and today's prices, which is in the third line: the calculation of the percent that the S&P 500 index changed from yesterday to today.

Now you can plot the histogram (Figure 6):

> hist(changePercent, breaks=10, 
       main="S&P 500 Daily Percent Change")

Histogram: S&P 500 daily percent change
Figure 6. The S&P 500 daily percent change

The breaks parameter tells R approximately how many bins to create while sorting the data. In this case, I asked for 10 bins, but R produced a plot with 13 bins spaced 1 percent apart. R uses the suggested value as a guideline, but in its default mode chooses a bin spacing and number of bins that yields a plot that is easy to comprehend based on the input data. In this case, the data was such that a bin spacing of 1 percent produced bins with divisions on the whole numbers, and the 13 required bins was close to the requested 10 bins, so R produced its plot accordingly. Experiment with different break values to see how this works.

For cases where you require a specific set of bin break points, specify a list of values for the breaks parameter. In this case, R will produce bins bounded precisely at the specified values.

Looking at the histogram plot, you can see that in the past 10 years, on most days the S&P 500 either rose or declined by less than 1 percent; but it rose on more days than it declined.

Correlation

An interesting question: does a correlation exist between the stock market's movement one day and its performance the next day? In other words, if the stock market rose yesterday, is it likely to rise today? To gain some insight on these questions, analyze the daily percent change data further using the following R commands:

> changePercentYesterday = changePercent[1:2662]
> changePercentToday = changePercent[2:2663]
> myDf <- data.frame(x=changePercentYesterday, 
       y=changePercentToday)
> myFm <- lm(y~x, data=myDf)
> plot(changePercentYesterday, changePercentToday, 
       main="Daily Change Correlation")
> abline(coef(myFm), col="red")
> summary(myFm)

This looks complicated, but it also illustrates how much work just a few lines of R code can do. The first two lines create yesterday and today percent change variables such that changePercentYesterday[i] is aligned with changePercentToday[i], permitting calculations and plotting using yesterday's change and today's change. The third line creates a new data frame (myDf) that has as its x data the values stored in changePercentYesterday and as its y data the values stored in changePercentToday. The fourth line uses R's lm() "linear model" statistical function to perform a linear fit of the data in myDf. Next, it plots the raw data (yesterday's change versus today's change) using plot(), with yesterday's percent change plotted as the X value and today's percent change plotted as the Y value. The abline() function adds a red "best fit" line to the graph based on the y-intercept and slope coefficients calculated by the lm() function, yielding the plot in Figure 7.

S&P 500 daily change correlation plot
Figure 7. The S&P 500 daily change correlation

Pages: 1, 2, 3, 4, 5

Next Pagearrow





Sponsored by: