ONLamp.com    
 Published on ONLamp.com (http://www.onlamp.com/)
 See this if you're having trouble printing code examples


aboutSQL

The Outer Limits of SQL JOINs

06/27/2001

First off this week, thanks for your patience during the recent breaks in this column -- I'm now the proud father of new baby (our first), and as you may imagine, I needed a little time to get into the swing of balancing new responsibilities with the rest of my life. The good news is that this is just the first of a large stack of new aboutSQL columns which should keep us on track for the next several months.

To refresh everyone's memory, we've started talking about SQL JOIN statements over the past few columns -- one of the most fundamentally important areas of SQL development. In the last column, we talked about the various types of joins, and I promised to cover "outer joins" this time around to complement the discussions of inner and cross joins from the past two columns. So with no further ado....

The SQL outer join

Every type of SQL join effectively multiplies the row in one table by the rows in the other table that is participating in the join (and which can be extended to include as many tables are participating in the JOIN operation).

When we discussed the Cartesian product (the "cross join"), we saw the raw results of a JOIN operation. The INNER JOIN statement provided a useful filter to that raw result by picking out only rows where the key fields have the same value. The result of the INNER JOIN operation is one row for each key value that exists in both joined tables. But what about situations where there are rows in one table that do not have a corresponding match in the other table? An inner join ignores these rows.

An outer join is used to include the rows that are "missing" in an inner join. Using our ongoing CD database as an example may make things clearer. So far, we've created joins that pull our artist data and CD data together to create a catalog of the CD collection. But let's say I've heard a new artist on the radio and immediately decide to put them in the database -- David Gray, for example.

Artists
ArtistID ArtistName
1 Peter Gabriel
2 Bruce Hornsby
3 Lyle Lovett
4 Beach Boys
5 David Gray

If I create an inner join between the Artists and CD tables, I get the following result.

SELECT * FROM Artists, CDs WHERE Artists.ArtistID=CDs.ArtistID

CDID CDs.ArtistID Artists.ArtistID ArtistName Title Year
1 1 1 Peter Gabriel So 1984
2 1 1 Peter Gabriel Us 1992
3 2 2 Bruce Hornsby The Way It Is 1986
4 2 2 Bruce Hornsby Scenes from the Southside 1990
5 1 1 Peter Gabriel Security 1990
6 3 3 Lyle Lovett Joshua Judges Ruth 1992
7 4 4 Beach Boys Pet Sounds 1966

These results accurately describe my catalog. But what about David Gray and all the others I'm planning on adding to the catalog? They completely disappear from the result because there is no matching row where CDs.ArtistID=5. Big deal you may say, but what if the two tables were Customers and Orders instead? Or maybe Orders and Items?

Comment on this articleYour questions and comments about using outer joins.
Post your comments

Also in aboutSQL:

SQL Data Types

Working With Tables

Data Definition Language

In the first case, only customers with orders would show up in the report; in the second case, only items that had actually been ordered would show up. This sort of "missing" information could completely ruin some types of calculations! An "outer join" operation will fix this problem.

Using the outer join

Outer joins come in three distinct flavors:

Knowing that the purpose of an outer join is to include the "missing" or unmatched rows, you can probably figure out what each of these flavors means. The LEFT, RIGHT, and FULL syntax all describe which of the table's unmatched columns to include. If we wanted to fix the CD collection example, we could change the inner join to the following outer join:

SELECT * FROM Artists, CDs WHERE Artists.ArtistID *= CDs.ArtistID

The LEFT OUTER JOIN operator ensures that all rows on the "left" side of the join, in this case the Artists table, will be included. It is important to note that "left" and "right" are completely dependent on the order of the tables in the SQL statement. The following SQL statements are all identical:

SELECT * FROM Artists, CDs WHERE Artists.ArtistID *= CDs.ArtistID
  SELECT * FROM CDs, Artists WHERE CDs.ArtistID =* Artists.ArtistID
  SELECT * FROM Artists LEFT OUTER JOIN CDs ON Artists.ArtistID = CDs.ArtistID
  SELECT * FROM CDs RIGHT OUTER JOIN Artists ON CDs.ArtistID = Artists.ArtistID

And as you can probably gather, a "full outer join" includes all unmatched rows from both tables in the result.

The outer join is particularly useful for creating aggregate data reports. You could count the orders placed for each item in an inventory table using a SQL statement something like the following:

SELECT Inventory.InventoryID, COUNT(Orders.OrderID)
  FROM Inventory LEFT OUTER JOIN Orders
  ON Inventory.InventoryID = Orders.InventoryID

  GROUP BY Inventory.InventoryID

This would count all of the orders for each inventory row, even those with no matching orders (this example assumes that the records in the Orders table contains orders for single inventory items). You could also use this technique to count the orders per customer, inventory per vendor, or any other type of relationship where it is useful to know that some rows in one table have no corresponding values.

Next steps

This week we took a look at the OUTER JOIN operator and some real-world examples. Next week, we jump into final basic type of join -- the self join. Until then, feel free to contact me with your comments and questions. Believe it or not, I read all of the email you send me and answer the vast majority of it! Some of the questions coming in will make it into future columns after we've covered the basics of SQL.

John Paul Ashenfelter is president and CTO of TransitionPoint.com, a technology development and analysis group focusing on web database applications.


Return to ONLamp.com.

Copyright © 2009 O'Reilly Media, Inc.