Published on

See this if you're having trouble printing code examples

Also in aboutSQL: |

In the last article, we took a first look at SQL functions. These provide an incredible new layer of possibility for SQL development, but add the complexity of varying in syntax and availability between the different database platforms. Despite these issues, SQL functions are a staple of serious database developers and should rapidly become part of your repertoire. This week, we will take a look at a specific class of SQL functions that operate on entire columns of data instead of discrete values -- the aggregate functions.

If you've spent much time using spreadsheets, you're already familiar with the concept of *aggregate functions*. Any sort of function that operates on a large set (column) of data and typically returns a single value is an aggregate function. There are a wide range of aggregate functions, but they tend to break down into two major categories:

*properties*such as a number of members, minimums, and maximums; and*statistics*such as an average or standard deviation.

How do you know what aggregate functions your database supports? I'll echo last week's comments and remind you that the only reliable way is to check the documentation. As a general rule of thumb, the more a database costs (or the more widely it is used for open source databases), the more functions it supports. One advantage of aggregate functions is that they are a little more standardized than plain-vanilla functions -- any worthwhile database is going to have functions for counting a set, creating an average, and the other little routines that make databases truly useful.

The crucial difference between aggregate functions and "normal" functions is that they use the entire column of data as their input and produce a single output; the "normal" functions operate on each element in the column of data. To make this a little easier to understand, it's worth looking at an example. Imagine a database with a field called MyNumber, as shown in the table below:

MyNumber |

1 |

4 |

9 |

16 |

25 |

36 |

49 |

We could use take the square root of each value using the SQL statement:

`SELECT SQRT(MyNumber)`

Results |

1 |

2 |

3 |

4 |

5 |

6 |

7 |

and we could limit the operation to values of MyNumber under 25 using a WHERE clause:

`SELECT SQRT(MyNumber) WHERE MyNumber < 25>`

Results |

1 |

2 |

3 |

4 |

In contrast, let's take a look at the aggregate function COUNT, which simply counts the number of elements in a column of data. If we take the same MyNumber field and use the following SQL

`SELECT Count(MyNumber)`

Results |

7 |

and to finish the comparison, when we combine it with a WHERE clause, it only operates on the data filtered by the where clause:

`SELECT Count(MyNumber) WHERE MyNumber < 25`

Results |

4 |

The bottom line is that functions operate on each data element while aggregate functions operate on the set of data elements as a whole.

In addition to the differences between how functions and aggregate functions operate, there's a difference in how you use them in a SQL statement. As we discussed last week, functions are often used in the WHERE clause of a SQL statement to help filter a data set. Since aggregates work on an entire column of data, they are typically one of the *result*s of a SELECT statement, just like a database field. Take for instance, a sales database with fields for an employeeID, a sales amount, and a date. The database may look something like

EmployeeID | SalesAmount | SalesDate |

1 | 150 | 1/1/01 |

1 | 2500 | 1/4/01 |

2 | 25 | 1/3/01 |

3 | 45 | 1/7/01 |

3 | 1000 | 1/8/01 |

3 | 2010 | 1/8/01 |

If you want to generate a sales report broken down by a particular employee, you could use this SQL statement:

`SELECT SUM(SalesAmount) WHERE EmployeeID=3`

which would return the total sales ($3055) for that employee. Notice that we're selecting the SUM much like we'd select any other database field. Of course, the next obvious step is to return the sums for all the employees, but that requires a few more concepts that I'll introduce in the next few columns.

While the specific implementation of aggregate functions is different for each database platform, there is a pretty solid set that are available in virtually all databases and thus can be more widely used. The following table shows some of the most common functions and their implementation in the common database platforms.

Function |
Access |
SQL Server |
MySQL |
Oracle |

Sum |
`SUM` |
`SUM` |
`SUM` |
`SUM` |

Average |
`AVG` |
`AVG` |
`AVG` |
`AVG` |

Count |
`COUNT` |
`COUNT` |
`COUNT` |
`COUNT` |

Standard Deviation |
`STDEV` |
`STDEV` |
`STD, STDEV` |
`STDEV, STDEV POP, STDEV SAMP` |

Minimum |
`MIN` |
`MIN` |
`MIN` |
`MIN` |

Maximum |
`MAX` |
`MAX` |
`MAX` |
`MAX` |

Notice that there is not a lot of difference between the platforms. This makes it fairly safe to use aggregate functions in cross-platform SQL development efforts.

Aggregate functions provide a higher level of functionality for SQL development, and are particularly useful for processing business data to generate reports. In the coming weeks, we'll extend the functionality of SQL, particularly aggregate functions, by learning the clauses that provide the means to group and order data. Until then, keep the letters and comments coming aboutSQL.

*
John Paul Ashenfelter
is president and CTO of TransitionPoint.com, a technology development and analysis group focusing on web database applications.
*

*Read more aboutSQL columns*.

Copyright © 2009 O'Reilly Media, Inc.