O'Reilly Databases

oreilly.comSafari Books Online.Conferences.

We've expanded our coverage and improved our search! Search for all things Database across O'Reilly!

Search Search Tips

advertisement
AddThis Social Bookmark Button

Print Subscribe to Databases Subscribe to Newsletters

MySQL Federated Tables: The Missing Manual
Pages: 1, 2, 3, 4, 5

Usage and Performance

Now that you know something more about the table connections, you can consider the interesting things, such as how the system performs when you actually use the data, which is the purpose of the whole exercise.



Honoring the WHERE Clause

Start by issuing a simple query, searching for a word that you may reasonably expect to find in the table.

server2 (test) > select * from tkey_fed where word = 'condor';
+-------+--------+
| id    | word   |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (0.08 sec)

The target server will dutifully execute exactly what you asked, as the general log shows.

$ tail ~/server1/data/general.log
060612 13:37:18  4 Query  SHOW TABLE STATUS LIKE 'tkey'
                 4 Query  SELECT `id`, `word` FROM `tkey` WHERE  (`word` = 'condor')

Now try something a tad more ambitious:

server2 (test) > select * from tkey_fed where word like 'condor%';
+-------+-----------+
| id    | word      |
+-------+-----------+
| 91569 | condor    |
| 91570 | condorcet |
| 91571 | condores  |
| 91572 | condors   |
+-------+-----------+
4 rows in set (0.09 sec)

...and the general log shows:

060612 13:41:47       4 Query       SHOW TABLE STATUS LIKE 'tkey'
                      4 Query       SELECT `id`, `word` FROM `tkey` WHERE  
  ( (`word` >= 'condor\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0') ) 
  AND ( (`word` <= 'condor') )

Surprise, surprise! What's this? This query looks wrong, even though the result is as you expected. There must be something more than that. In fact, if you capture the previous query through a packet sniffer, you will see the truth:

0000  a6 00 00 00 03 53 45 4c 45 43 54 20 60 69 64 60   .....SELECT `id`
0010  2c 20 60 77 6f 72 64 60 20 46 52 4f 4d 20 60 74   , `word` FROM `t
0020  6b 65 79 60 20 57 48 45 52 45 20 20 28 20 28 60   key` WHERE  ( (`
0030  77 6f 72 64 60 20 3e 3d 20 27 63 6f 6e 64 6f 72   word` >= 'condor
0040  5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30   \0\0\0\0\0\0\0\0
0050  5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30   \0\0\0\0\0\0\0\0
0060  5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30   \0\0\0\0\0\0\0\0
0070  27 29 20 29 20 41 4e 44 20 28 20 28 60 77 6f 72   ') ) AND ( (`wor
0080  64 60 20 3c 3d 20 27 63 6f 6e 64 6f 72 ff ff ff   d` <= 'condor...
0090  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff   ................
00a0  ff ff ff ff ff 27 29 20 29 20                     .....') )

That's better, as odd as it may look. The server will find the string condor, followed by any number of characters between 0x00 and 0xff (decimal 255). There might be some issue for Unicode support, but leaving that aside for now, it's a reasonably efficient query.

It's time to examine the ugly part. What happens when you issue the same query against the table without an index?

server2 (test) > select * from tnokey_fed where word = 'condor';
+-------+--------+
| id    | word   |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (2.07 sec)

You likely expected the query to be a little slower, but surely not that slow! Inspecting the logs reveals more:

$ tail ~/server1/data/general.log
060612 13:57:33  5 Connect  server2usr@localhost on test
                 5 Query    SET NAMES latin1
                 5 Query    SHOW TABLE STATUS LIKE 'tnokey'
                 5 Query    SELECT `id`, `word` FROM `tnokey`

$ tail ~/server1/data/slow_queries.log
# Time: 060612 13:57:35
# User@Host: server2usr[server2usr] @ localhost [127.0.0.1]
# Query_time: 2  Lock_time: 0  Rows_sent: 518584  Rows_examined: 518584
SELECT `id`, `word` FROM `tnokey`;

That's bad! The WHERE clause was irrelevant! As the slow queries log shows, the federated engine sent half a million rows across the network, leaving the burden of filtering it to the client. It was obvious that this query would trigger a full table scan (because there is no index on that column), and although you have to live with that, sending all the records across the network is a burden that's best avoided. What can you do?

Cheating With Bogus Indexes

The truth is, for reasons I can't figure out, federated tables honor the WHERE clause only if the linking table has an index. This looks like a bug to me, but there is a workaround: rebuild the federated table with a phantom key.

drop table if exists tnokey_fed; # remember: no ALTER TABLE support for federated tables
create table tnokey_fed ( 
    id int not null primary key, 
    word varchar(30) not null,
    KEY (word)
)
engine=federated 
connection='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tnokey';

Now test that assumption:

server2 (test) > select * from tnokey_fed where word = 'condor';
+-------+--------+
| id    | word   |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (0.64 sec)

This is much better. A table scan is slower than a direct index lookup (there are exceptions, but in this case the rule stays unchallenged). A look at the general log confirms this approach.

060612 14:12:10       7 Connect     server2usr@localhost on test
                      7 Query       SHOW TABLE STATUS LIKE 'tnokey'
                      7 Query       SELECT `id`, `word` FROM `tnokey` WHERE  (`word` = 'condor')

Thus, I can confidently state....

Rule number 5: The WHERE clause is honored only when using an index defined on the local server.

Do you want more evidence? Just rebuild tkey_fed without a key for the word column, and the previous query will be as slow as the initial one from tnokey_fed.

Developer's angle: The fix is to push down conditions for non-indexed columns, resulting in fewer table scans.

drop table if exists tkey_fed;
create table tkey_fed (
    id int not null auto_increment primary key,
    word varchar(30) not null
)
engine=federated
connection='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tkey';

server2 (test) > select * from tkey_fed where word = 'condor';
+-------+--------+
| id    | word   |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (1.57 sec)
$ tail ~/server1/data/general.log
060612 14:19:07  9 Connect  server2usr@localhost on test
                 9 Query    SHOW TABLE STATUS LIKE 'tkey'
                 9 Query    SELECT `id`, `word` FROM `tkey`

Summing up on performance, the situation looks bad, but it's livable. Cope. Adding an index to a normal table is an expensive operation, but adding an index to a federated table is quite cheap, as stated in rule number two. Therefore, whenever you face a performance problem, you can cheat a bit.

Be aware that the index rule is quite crude. The federated engine will only use (according to my tests) one column in the WHERE clause, even if you create a compound index.

JOINs

The notes so far apply also to joins. The federated engine will grab all the data it needs to perform a join, and this can easily lead to a disaster if you don't plan your queries very carefully.

This next test uses the MySQL world sample database on server1 and three federated tables on server2.

server2 (world) > select City.Name 
    -> from City INNER JOIN Country ON Country.Code=City.CountryCode 
    -> WHERE Country.Name = 'Afghanistan';
+----------------+
| Name           |
+----------------+
| Kabul          |
| Qandahar       |
| Herat          |
| Mazar-e-Sharif |
+----------------+
4 rows in set (0.19 sec)

Remember that these are relatively small tables, with just a couple of hundred records in Country and a few thousand in City. If you have seen the world database before, you know that its tables have just a primary key and no secondary indexes. The general log will show some horrors behind the scenes.

$ tail ~/server1/data/general.log
20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'VUT')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'VAT')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'VEN')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'RUS')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'VNM')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'EST')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'USA')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'VIR')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'ZWE')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'PSE')

Here tail will not be much help, besides telling you that the federated engine issued a bunch of queries from Country. You have to dig deeper to find out how many.

$ grep 'FROM `Country`'  ~/server1/data/general.log  | wc -l
232

$ tail -n 236 ~/server1/data/general.log | head -n 6
Time                 Id Command    Argument
     19 Query  SELECT `ID`, `Name`, `CountryCode`, `District`, `Population` FROM `City`
     20 Query  SELECT `Code`, `Name`, ... FROM `Country` WHERE  (`Code` = 'AFG')
     20 Query  SELECT `Code`, `Name`, ... FROM `Country` WHERE  (`Code` = 'NLD')

First it asked for all records from City. That's understandable because there is no key. Then, it created a query for each CountryCode found in City: 232 queries. For a simple join, it's quite expensive.

Once more, you can fix the problem by adding appropriate keys to both tables. An index on City.CountryCode and one on Country.Name will solve the problem.

server2 (world) > SELECT City.Name 
    -> FROM City INNER JOIN Country ON Country.Code=City.CountryCode 
    -> WHERE Country.Name = 'Afghanistan';
+----------------+
| Name           |
+----------------+
| Kabul          |
| Qandahar       |
| Herat          |
| Mazar-e-Sharif |
+----------------+
4 rows in set (0.04 sec)
$ tail -n 3 ~/server1/data/general.log
25 Query  SHOW TABLE STATUS LIKE 'Country'
     25 Query  SELECT `Code`, `Name`, ... FROM `Country` WHERE  (`Name` = 'Afghanistan')
     24 Query  SELECT `ID`, `Name`, ... FROM `City` WHERE  (`CountryCode` = 'AFG')

The outcome was just two queries, both using a WHERE clause. Still, the engine is not smart enough to make one query only, but that's better than nothing. Another improvement is to add a real index on server1, although that is beside the point; the main problem in this case is not the speed of record retrieval, but the amount of data passed across the network.

Developer's angle: A bug fix is on its way for join performance.

Using Aggregate Functions

Here comes the really bad news. No matter how much cheating, when using COUNT, SUM, MAX, and all other aggregate functions, the federated engine will read all of the records.

Consider again the first example, with half a million words. Try a seemingly harmless query:

server2 (test) > select count(*) from tkey_fed;
+----------+
| count(*) |
+----------+
|   518585 |
+----------+
1 row in set (2.37 sec)

If you think it's always a good idea to do a SELECT COUNT(*) before attempting a potentially demanding query, think again.

This query took much longer than expected, and as always, the log inspection shows why:

$ tail ~/server1/data/general.log
060612 20:50:24  2 Query  SHOW TABLE STATUS LIKE 'tkey'
                 2 Query  SELECT `id`, `word` FROM `tkey`

Disaster! Five hundred thousand records went across the network just to get a count? There must be good reasons for such behavior, but from the user's point of view, it's really unacceptable. There is a workaround coming, but for now, it's enough to note....

Rule number 6: Aggregate functions with federated tables are extremely expensive.

However, don't lose heart just yet. Some relief is coming your way shortly.

Pages: 1, 2, 3, 4, 5

Next Pagearrow




Tagged Articles

Be the first to post this article to del.icio.us

Related to this Article

MySQL Stored Procedure Programming MySQL Stored Procedure Programming
by Guy Harrison , Steven Feuerstein
March 2006
$44.99 USD

Data Jujitsu: The Art of Turning Data into Product Data Jujitsu: The Art of Turning Data into Product
November 2012
$0.00 USD

Sponsored Resources

  • Inside Lightroom
Advertisement
O'reilly

© 2013, O’Reilly Media, Inc.

(707) 827-7019 (800) 889-8969

All trademarks and registered trademarks appearing on oreilly.com are the property of their respective owners.

About O'Reilly

  • Academic Solutions
  • Jobs
  • Contacts
  • Corporate Information
  • Press Room
  • Privacy Policy
  • Terms of Service
  • Writing for O'Reilly

Community

  • Authors
  • Community & Featured Users
  • Forums
  • Membership
  • Newsletters
  • O'Reilly Answers
  • RSS Feeds
  • User Groups

Partner Sites

  • makezine.com
  • makerfaire.com
  • craftzine.com
  • igniteshow.com
  • PayPal Developer Zone
  • O'Reilly Insights on Forbes.com

Shop O'Reilly

  • Customer Service
  • Contact Us
  • Shipping Information
  • Ordering & Payment
  • The O'Reilly Guarantee