Histograms are plots that show the distribution of a set of values. It's easy to use R to look at the distribution of daily gains (and losses) for the S&P 500 in the sample data set.
First, you must calculate the daily percent change for each day. You can't do this for the first day in the data set, so the size of the percent gains array will be one unit less than the size of the sp500value array. Here's one method for directing R to create the daily percent gain array, based on knowledge (gained using a Unix
wc command) that the data set consists of 2,664 total points:
> yesterday = sp500value[1:2663] > today = sp500value[2:2664] > changePercent = 100 * (today / yesterday - 1.0)
Here, the new variable
yesterday is the set of S&P 500 values (excluding the last day), and the new variable
today is the set of S&P 500 values offset by one day (excluding the first day). Hence, the two variables are aligned such that
today[i] represent yesterday's and today's S&P 500 price. This allows application of an equation using yesterday's and today's prices, which is in the third line: the calculation of the percent that the S&P 500 index changed from yesterday to today.
Now you can plot the histogram (Figure 6):
> hist(changePercent, breaks=10, main="S&P 500 Daily Percent Change")
Figure 6. The S&P 500 daily percent change
breaks parameter tells R approximately how many bins to create while sorting the data. In this case, I asked for 10 bins, but R produced a plot with 13 bins spaced 1 percent apart. R uses the suggested value as a guideline, but in its default mode chooses a bin spacing and number of bins that yields a plot that is easy to comprehend based on the input data. In this case, the data was such that a bin spacing of 1 percent produced bins with divisions on the whole numbers, and the 13 required bins was close to the requested 10 bins, so R produced its plot accordingly. Experiment with different
break values to see how this works.
For cases where you require a specific set of bin break points, specify a list of values for the
breaks parameter. In this case, R will produce bins bounded precisely at the specified values.
Looking at the histogram plot, you can see that in the past 10 years, on most days the S&P 500 either rose or declined by less than 1 percent; but it rose on more days than it declined.
An interesting question: does a correlation exist between the stock market's movement one day and its performance the next day? In other words, if the stock market rose yesterday, is it likely to rise today? To gain some insight on these questions, analyze the daily percent change data further using the following R commands:
> changePercentYesterday = changePercent[1:2662] > changePercentToday = changePercent[2:2663] > myDf <- data.frame(x=changePercentYesterday, y=changePercentToday) > myFm <- lm(y~x, data=myDf) > plot(changePercentYesterday, changePercentToday, main="Daily Change Correlation") > abline(coef(myFm), col="red") > summary(myFm)
This looks complicated, but it also illustrates how much work just a few lines of R code can do. The first two lines create
today percent change variables such that
changePercentYesterday[i] is aligned with
changePercentToday[i], permitting calculations and plotting using yesterday's change and today's change. The third line creates a new data frame (
myDf) that has as its
x data the values stored in
changePercentYesterday and as its
y data the values stored in
changePercentToday. The fourth line uses R's
lm() "linear model" statistical function to perform a linear fit of the data in
myDf. Next, it plots the raw data (yesterday's change versus today's change) using
plot(), with yesterday's percent change plotted as the X value and today's percent change plotted as the Y value. The
abline() function adds a red "best fit" line to the graph based on the y-intercept and slope coefficients calculated by the
lm() function, yielding the plot in Figure 7.
Figure 7. The S&P 500 daily change correlation