ONLamp.com
oreilly.comSafari Books Online.Conferences.

advertisement


Writing PostgreSQL Functions with PL/pgSQL
Pages: 1, 2, 3

Accessing Data

As a recursive function, the fib() example is none too fast. Of course it short-circuits when the argument is less than two, but otherwise its recursion can be quite slow. Why? Because each time you call it, it must calculate and return many of the same numbers. On my PowerBook, for example, it takes nearly 3.8 seconds to find the 26th Fibonacci number:



try=% explain analyze select fib(26);
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=3772.062..3772.063 rows=1 loops=1)
 Total runtime: 3772.315 ms
(2 rows)

Why does it take so long? Like any recursive Fibonacci function, it has to make 392,835 recursive calls to itself to calculate the 26th Fibonacci number. Those recursive calls add up! Because the function always returns the same values for the same arguments, it would really improve the performance of the function to memoize it. Memoization caches the results of a function call for a given set of arguments so that when the function is called again with the same arguments, it can simply return the value from the cache without recalculating it--in this case, minimizing the need for recursion.

PL/pgSQL itself has no ability to store data outside of a function, but this is database programming--take advantage of it! The trick is to create a table to function as the cache, then access it from the function. The new function, fib_cached(), does exactly that:

 1   CREATE TABLE fib_cache (
 2        num integer PRIMARY KEY,
 3        fib integer NOT NULL
 4   );
 5
 6   CREATE OR REPLACE FUNCTION fib_cached(
 7       fib_for integer
 8   ) RETURNS integer AS $$
 9   DECLARE
10      ret integer;
11  BEGIN
12      if fib_for < 2 THEN
13          RETURN fib_for;
14      END IF;
15
16      SELECT INTO ret fib
17      FROM   fib_cache
18      WHERE  num = fib_for;
19
20      IF ret IS NULL THEN
21          ret := fib_cached(fib_for - 2) + fib_cached(fib_for - 1);
22          INSERT INTO fib_cache (num, fib)
23          VALUES (fib_for, ret);
24      END IF;
25      RETURN ret;
26
27  END;
28  $$ LANGUAGE plpgsql;

Lines 1-4 create the table for caching the Fibonacci sequence. The num column represents the sequence number for which the corresponding Fibonacci number is stored in the fib column. The num column is a primary key because it should be unique.

The fib_cached() function defined from lines 6-28 introduces quite a bit more syntax. The first line with something new is line five's DECLARE statement. As you may have ascertained by the previous discussion of argument aliases, this statement introduces a block for declaring variables for use in the function body. All variables used in the function but not declared in the argument signature must be declared here. You can give them a preliminary assignment using the PL/pgSQL assignment operator, := (so named to avoid conflicts with the SQL = comparison operator). You can use any PostgreSQL data type for your variables, but this example again keeps things quite simple. There is a single integer variable, ret, which keeps track of a value for the function to return.

The BEGIN statement on line 11 ends the variable declaration block and starts the function body. Line 12 contains the familiar IF-THEN block that once again short-circuits the function if the argument to the function (stored in fib_for) is less than two. Then things get more interesting.

As shown in the DECLARE block, you can assign a value to a PL/pgSQL variable using :=, but what if you want to assign a value from a SELECT statement to a variable? Lines 16-18 demonstrate the approach. A variation of the standard SELECT INTO statement allows you to select values into one or more PL/pgSQL variables rather than into a table. A comma-delimited list of variables follows the INTO expression, while the rest of the statement remains a normal SELECT statement. There are a couple of caveats to SELECT INTO assignment, however: the SELECT statement must return no more than one row, and the selected columns must match the number and types of the variables.

Here it's relatively straightforward. The code looks in its cache (the fib_cache table) to see if it has already calculated and cached the Fibonacci number for the sequence number fib_for. The SELECT statement selects the fib column from fib_cached where the number is fib_for and stores the result in the ret variable.

Now, I mentioned that the SELECT INTO statement can return no more than one row, which also means that it can return zero rows. If this is the case, then the value of ret will be NULL in this function. Accordingly, line 20 begins an IF-THEN block by checking to see if the value of ret is indeed NULL. If it is, the function needs to calculate that Fibonacci number. Line 21 thus recursively calls fib_cached() just as fib() used recursion. Instead of returning the value, the code uses the := assignment operator to store the value in ret.

With the new value calculated, the code needs to insert it into the cache so that it never has to be calculated again. Lines 22-23 do just that with a standard SQL INSERT statement. The variables fib_for and ret can be used right in the INSERT statement, just as fib_for was used in the SELECT statement at lines 16-18. One of the great features of PL/pgSQL is that it precompiles SQL statements embedded in it, using variables as if they were passed to a prepared SQL statement as arguments for placeholders. In other words, the INSERT statement magically becomes akin to:

PREPARE some_insert(integer, integer) AS
INSERT  INTO fib_cache (num, fib)
VALUES  ($1, $2);

EXECUTE some_insert(fib_for, ret);

The great thing about this feature of PL/pgSQL is that it makes embedded SQL statements extremely fast. The database can reuse the precompiled query plan for each call to the function (on a per-connection basis) without recompiling and planning.

At any rate, line 25 returns the value of ret, regardless of where it came from. Adding the caching support makes the function far faster:

try=% explain analyze select fib_cached(26);
                                      QUERY PLAN                                      
--------------------------------------------------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=50.837..50.838 rows=1 loops=1)
 Total runtime: 50.889 ms
(2 rows)

try=% explain analyze select fib_cached(26);
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Result  (cost=0.00..0.01 rows=1 width=0) (actual time=2.197..2.198 rows=1 loops=1)
 Total runtime: 2.249 ms
(2 rows)

The first call to fib_cached() finds no cached values, and so it must create them all as it goes along. This simply means that it needs to calculate the values for each number up to 26 once, practically eliminating that recursion (indeed, fib_cached() has only 24 recursive calls, once each for the numbers 2 through 26). The result is a much faster query: only .05 seconds, as opposed to the nearly 3.8 seconds for fib(). Of course the second call to fib_cached() needs only to look up and return the 26th Fibonacci number, because the first call has already cached it. That cuts the time down to 0.0025 seconds. Not bad, eh?

Pages: 1, 2, 3

Next Pagearrow





Sponsored by: