ONJava.com -- The Independent Source for Enterprise Java
oreilly.comSafari Books Online.Conferences.

advertisement

AddThis Social Bookmark Button

Tuning Derby
Pages: 1, 2, 3

With these extra few lines, we have provided "paging" functionality. Although all result sets are being fetched from the database server, only the rows of interest are actually mapped to Java objects. Now we are safe from the OutOfMemoryError problem that we had before and can be sure that this code will actually work with large tables.



But still, with this solution the database will scan through the whole table and return all rows, and that is certainly a time consuming task. For my example database, this operation takes up to ten seconds to execute, which is certainly not acceptable behavior for the application.

So, we have to come up with a solution; we do not want to retrieve all database rows but only those of our current interest (or at least the minimal possible subset of all rows). The trick we'll use here is to explicitly tell the JDBC driver how many rows we need. We can do this by using the setMaxRows() method of the java.sql.Statement interface. Let's look at this example:

Class.forName("org.apache.derby.jdbc.ClientDriver").newInstance();
Connection connection = DriverManager.getConnection(
            "jdbc:derby://localhost:1527/testDb;");
Statement stmt = connection.createStatement();
stmt.setMaxRows(101);
ResultSet rs = stmt.executeQuery("SELECT * FROM tbl");
ArrayList allResults = new ArrayList();
int i = 0;
while (rs.next()) {
        if (i > 50 && i <= 100) {
                // O-R mapping code populate your row from result set
                DomainObject domainObject = populate(rs);
                allResults.add(modelObject);
        }
}
System.out.println("Results Size: " + allResults.size());

Notice that we have set the max rows value to the last row that we need (incremented by one). So, with this solution we didn't fetch only the 50 rows that we wanted, but first fetched a hundred rows and then filter to the 50 rows of interest. Unfortunately, there is no way tell the JDBC driver to start with a certain row, so we must specify the maximum row of the page that will be displayed. This means that performance will be good for early pages and drop in performance as the user browses results. The good news is that in most cases, the user will not go far, but will usually either find what he's looking for in the first few pages or refine the search query. In my environment, execution time dropped from 8 seconds to 0.8 seconds for the above example.

This was an easy example of how to browse the whole table. But when you add certain WHERE conditions and ordering instructions to your query, things can change dramatically. In the following section, I will explain why this happens and how we can ensure acceptable application behavior in those cases.

Make Sure Indexes Are Used (Avoid Table Scans)

Indexes are a very important concept in database design. Since the scope of this article is limited, I will not go into detail about indexing theory. Briefly though, indexes are special database structures that allow quick access to table rows. They are usually created in relation to one or more columns, and since they are much smaller then the whole table, their primary use is to enable quick searching of values in a column (or columns).

Derby automatically creates indexes for primary and foreign key columns and for columns that have unique constraints on them. For everything else, we must explicitly create indexes. In the following section, we'll go through a few examples and explain where and how indexes can be helpful.

But first, we have to make some preparations. Before we can start tuning performance, we need to be able to see what is going on in the database when our query is executing. For that purpose, Derby provides the derby.language.logQueryPlan parameter. When this parameter is set, Derby will log the query plan for all executed queries in the derby.log file (located in the derby.system.home folder). You can achieve this through the appropriate derby.properties file or by executing the following Java statement:

System.setProperty("derby.language.logQueryPlan", "true");

before you start the server.

By examining the query plan, we can see whether Derby uses indexing for some queries or performs a full table scan, which can be a time-consuming operation.

Now that we have our environment set, we can proceed with the example. Let's say that in our previously used tbl example table, we have an unindexed column called owner. Because the sorting of the result is the usual suspect for poor query performance, I will illustrate all performance-tuning examples on problems related to sorting. Now, if we wanted to modify the previous example to sort our results by the value of this column, we would change our query to something like this:

SELECT * FROM tbl ORDER BY owner

If we now run our example with this query instead of the original one, the execution time will be an order of magnitude higher then before. Despite the fact that we paginated the results and dealt carefully with the number of rows to be fetched, the total execution time will again be about 8 seconds.

If we look at the query execution plan in the derby.log file, we can easily spot the problem:

Table Scan ResultSet for TBL at read committed isolation
level using instantaneous share row locking chosen
by the optimizer

This means that Derby performed look-up throughout the entire table in order to sort the row set. Now, what can we do to improve this situation? The answer is simple: create an index on this column. We can do that by issuing the following SQL statement:

CREATE INDEX tbl_owner ON tbl(owner)

If we now repeat our previous example, we should get a result similar to the one we got without ordering (under one second in my case).

Also, if you look into derby.log now, you will see a line like this (instead of a line like the previous one):

Index Scan ResultSet for TBL using index TBL_OWNER
at read committed isolation level using share row locking
chosen by the optimizer

which means you can be sure that Derby used our newly created index to get the appropriate rows.

Use Appropriate Index Order

We have seen how indexes helped us improve performances of sorting data by a column value. But what would happen if we tried to reverse the order of sorting? For example, let's say that we want to sort our example data by owner column but in descending order. In that case, our original query would be something like this:

SELECT * FROM tbl ORDER BY owner DESC

Notice the added DESC keyword, which sorts our result set in descending order. If we run our example with this modified query, you'll notice that the execution time increases to the previous rate of 8 to 9 seconds. Also, in the logfile, you will notice that the full table scan was performed in this case.

The solution is to create a descending index for the column in question. For our owner column, we can do that with the following SQL statement:

CREATE INDEX tbl_owner_desc ON tbl(owner desc)

Now we have two indexes for this column (in both directions), so our query will be executed with acceptable performances this time. Notice the following line in the query log:

Index Scan ResultSet for TBL using index TBL_OWNER_DESC
at read committed isolation level using share row locking
chosen by the optimizer

which confirms that our newly created index was used. So, in case you often use queries that sort results in descending order, you may think of creating a suitable index to achieve better performances.

Recreate Indexes

Over time, index pages can fragment, which can cause serious performance degradation. For example, let's say we have an index, created some time ago, on the time_create column of our tbl table.

If we execute the query:

SELECT * FROM tbl ORDER BY time_create

we can get poor performance, much as if we didn't have an index at all. If we look into the query plan log, we can find the source of our problem. You will see that index scan has been used, but you can usually find a line similar to the following one in the log:

Number of pages visited=1210

This means the database performed a lot of IO operations during index search, which is the main bottleneck of this query execution.

The solution in this case is to recreate the index (i.e., drop and create it again). This will make the index defragmented again and save us from a lot of IO operations. We can do this by issuing the following SQL statements:

DROP INDEX tbl_time_create

CREATE INDEX tbl_time_create ON tbl(time_create)

You'll notice that execution time drops to an acceptable value (under one second). Also, in the log file you will now find the following line:

Number of pages visited=5

As you can see, the execution time dropped significantly because the database had to perform only a few IO operations.

So, the general rule of thumb is to make your application recreate indexes on a regular basis. It's best to schedule a background job in your application to do this every now and then.

Pages: 1, 2, 3

Next Pagearrow