O'Reilly    
 Published on O'Reilly (http://oreilly.com/)
 See this if you're having trouble printing code examples


MySQL Federated Tables: The Missing Manual

by Giuseppe Maxia
08/10/2006

One of the most exciting features introduced in MySQL 5 is the federated engine. The ability to access data from a remote server without the constraints of replication tickles every programmer's fancy.

Unfortunately, as of today, the documentation is not quite as detailed as I would like. I have the feeling that the federated engine has been somehow neglected. This article comes from my personal experience with this engine.

Color Codes

Basic Federated Usage

This information is also available in the MySQL manual, but for the sake of completeness, I want to say a few words on the basics.

Federated tables are tables with storage in a remote server. When defining a table with this engine, you refer to another table in a different server, using the same structure. The only thing that resides in your local server is the definition of the table, which must be identical to the remote one, except for the engine specification.

On the remote server, you have:

CREATE TABLE `City` (
  `ID` int(11) NOT NULL auto_increment,
  `Name` char(35) NOT NULL default '',
  `CountryCode` char(3) NOT NULL default '',
  `District` char(20) NOT NULL default '',
  `Population` int(11) NOT NULL default '0',
  PRIMARY KEY  (`ID`)
) ENGINE=MyISAM;

To use that table in your local server, enter the description:

CREATE TABLE `City` (
  `ID` int(11) NOT NULL auto_increment,
  `Name` char(35) NOT NULL default '',
  `CountryCode` char(3) NOT NULL default '',
  `District` char(20) NOT NULL default '',
  `Population` int(11) NOT NULL default '0',
  PRIMARY KEY  (`ID`)
) ENGINE = FEDERATED
connection='mysql://user:pass@remote.com:3306/world/City';

Here, user and pass are valid credentials to access the table City in the database world on server remote.com.

With that done, you can query your federated table as if it were in your local server. Issue a query and get a record set.

select * from City where ID = 1;

+----+-------+-------------+----------+------------+
| ID | Name  | CountryCode | District | Population |
+----+-------+-------------+----------+------------+
|  1 | Kabul | AFG         | Kabol    |    1780000 |
+----+-------+-------------+----------+------------+

There are a few limitations concerning federated engine usage, namely:

This is basically what the official docs say, no more and no less. Beyond that, it's all unexplored land.

Testing Federated Features

Because you won't have the reassuring backing from the manual in this matter, you should test these features for yourself. Because you need at least two instances of a MySQL server to use a federated table, here is a quick recipe to get the job done. Download the MySQL Sandbox and, from the command line, execute:

$ ./express_install.pl /opt/mysql/5.0.22/ -P 5001 -d server1 \
    -c log=general.log -c log-slow-queries=slow_queries.log \
    -c long_query_time=1 -c log-queries-not-using-indexes
$ ./express_install.pl /opt/mysql/5.0.22/ -P 5002 -d server2 \
    -c log=general.log -c log-slow-queries=slow_queries.log \
    -c long_query_time=1 -c log-queries-not-using-indexes

If you want to know more about the Sandbox, you can refer to the MySQL Sandbox tutorial. If you don't work in a Unix environment, you have to install it manually, following MySQL Windows server installation official guidelines.

The previous commands will create two directories under your $HOME, server1 and server2, each of which contains a data directory and a few bash scripts to start, stop, and use the instance. Both servers will have the general log and slow query log enabled, so you can examine what happens when the federated engine exchanges data between them.

Now start them with their appropriate commands:

$ ~/server1/start.sh
$ ~/server2/start.sh

To access the first server, use the command:

$ mysql -h 127.0.0.1 -u datacharmer -pdatacharmer -P 5001

Or just:

$ ~/server1/use.sh

For the second server, change the port to 5002:

$ mysql -h 127.0.0.1 -u datacharmer -pdatacharmer -P 5002

To make things easier to detect, create a user on server1, which server2 will use to access the federated tables.

$ echo \
    "grant select,execute on test.* to server2usr identified by 'datacharmer'" \
    | ~/server1/use.sh -u root

Now you're ready to get your hands dirty.

Creating Federated Tables

Your first task is to create two simple tables on server1, one with a key and one without. The reason for these two tables will become apparent later.

create database if not exists test;
use test;

drop table if exists tkey;
create table tkey ( 
    id int not null auto_increment primary key, 
    word varchar(30) not null, 
    key (word)
    );

drop table if exists tnokey;
create table tnokey ( 
    id int not null primary key, 
    word varchar(30) not null
    );

load data local infile '/usr/share/dict/words'
into table tkey (word);

insert into tnokey select * from tkey;

select count(*) from tkey;
select count(*) from tnokey;

Save this script to fed1.mysql and feed it to server1:

$ ~/server1/use.sh -t < fed1.mysql
+----------+
| count(*) |
+----------+
|   518584 |
+----------+
+----------+
| count(*) |
+----------+
|   518584 |
+----------+

(These numbers may be different in your system, depending on the size of your dictionary.)

On server2, create two federated tables that point to the ones just built on server1.

drop database if exists test;
create database if not exists test;
use test;

drop table if exists tkey_fed;
create table tkey_fed ( 
    id int not null auto_increment primary key, 
    word varchar(30) not null, 
    key (word)
)
engine=federated 
connection='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tkey';

drop table if exists tnokey_fed;
create table tnokey_fed ( 
    id int not null primary key, 
    word varchar(30) not null
)
engine=federated 
connection='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tnokey';

Save this to fed2.mysql and load it:

$ ~/server2/use.sh -t < fed2.mysql

If you don't get any output, your federated tables were created without problems. If you get any errors, go back to the previous section and make sure that you have done everything, including creating the appropriate user on server1.

Federated Connections

Now it's time to examine the first reaction from server1 by perusing the general log:

$ tail ~/server1/data/general.log
...
060604 18:22:14  7 Connect  server2usr@localhost on test
                 7 Query    SELECT *  FROM `tkey` WHERE  1=0
                 7 Quit
                 8 Connect  server2usr@localhost on test
                 8 Query    SELECT *  FROM `tnokey` WHERE  1=0
                 8 Quit

Rule number 1 of the Federated Missing Manual: The receiving server does not know that it's being federated.

I will show you more evidence for this rule later, but for now it's enough to note that, when creating a table, the second server issues a query to the first server with a WHERE clause that will return an empty set. That's because it needs to be sure that the target table exists.

Notice also that the connection closes immediately. Now proceed with...

Rule number 2: The creation of a federated table is an inexpensive operation.

It's one query on the linking server, plus one quick query on the linked server. That's enough evidence to state that a federated table does not require lots of resources, at least during its creation.

Your first operation on server2 will just check that the table exists.

server2 (test) > show tables;
+----------------+
| Tables_in_test |
+----------------+
| tkey_fed       |
| tnokey_fed     |
+----------------+

This simple operation will be visible on the linked server.

$ tail ~/server1/data/general.log
...
060610 23:07:02  6 Connect  server2usr@localhost on test
                 7 Connect  server2usr@localhost on test

There are two connections, one for each federated table. Is it possible to confirm this by other means?

server1 (test) > show processlist;
+----+-------------+-----------------+------+---------+------+-------+------------------+
| Id | User        | Host            | db   | Command | Time | State | Info             |
+----+-------------+-----------------+------+---------+------+-------+------------------+
|  6 | server2usr  | localhost:46443 | test | Sleep   |  298 | NULL  | NULL             |
|  7 | server2usr  | localhost:46444 | test | Sleep   |  298 | NULL  | NULL             |
|  8 | datacharmer | localhost       | test | Query   |    0 | NULL  | show processlist |
+----+-------------+-----------------+------+---------+------+-------+------------------+

There are indeed two connections from server2usr, which is the user that appears in the federated table connection string. Now, go one step further by adding another federated table on server2.

server2 (test) > create table test1 like tkey_fed;
Query OK, 0 rows affected (0.00 sec)

server2 (test) > show create table test1\G
*************************** 1. row ***************************
       Table: test1
Create Table: CREATE TABLE `test1` (
  `id` int(11) NOT NULL auto_increment,
  `word` varchar(30) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `word` (`word`)
) ENGINE=FEDERATED DEFAULT CHARSET=latin1 
    CONNECTION='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tkey'
1 row in set (0.01 sec)

server2 (test) > show create table tkey_fed\G
*************************** 1. row ***************************
       Table: tkey_fed
Create Table: CREATE TABLE `tkey_fed` (
  `id` int(11) NOT NULL auto_increment,
  `word` varchar(30) NOT NULL,
  PRIMARY KEY  (`id`),
  KEY `word` (`word`)
) ENGINE=FEDERATED DEFAULT CHARSET=latin1 
    CONNECTION='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tkey'
1 row in set (0.00 sec)

Now there are three federated tables on server2: two pointing to tkey and one pointing to tnokey. The general log will show the usual inexpensive query, but the surprise comes when checking the process list again:

server1 (test) > show processlist;
+----+-------------+-----------------+------+---------+------+-------+------------------+
| Id | User        | Host            | db   | Command | Time | State | Info             |
+----+-------------+-----------------+------+---------+------+-------+------------------+
|  7 | server2usr  | localhost:46444 | test | Sleep   |  869 | NULL  | NULL             |
|  8 | datacharmer | localhost       | test | Query   |    0 | NULL  | show processlist |
| 10 | server2usr  | localhost:56715 | test | Sleep   |  272 | NULL  | NULL             |
| 11 | server2usr  | localhost:60575 | test | Sleep   |  265 | NULL  | NULL             |
+----+-------------+-----------------+------+---------+------+-------+------------------+

There are three connections, for just two tables that are federated by another server, which leads to...

Rule number 3: Each federated table on a remote server will consume one connection, even if there are several remote tables pointing to the same local table.

While the creation is inexpensive, the actual usage starts to look a bit more onerous. If you plan to federate many tables from several servers, prepare to increase your server's connections.

Developer's angle: There will be some improvements. MySQL is already planning to reduce the number of open connections.

Speaking of connections, here's a good chance to try yet another experiment. What happens to server2's tables if you stop server1?

$ ~/server1/stop.sh
server2 (test) > SELECT table_name, engine 
    -> FROM information_schema.tables 
    -> WHERE table_schema=schema();
ERROR 1430 (HY000): There was a problem processing the query on the 
    foreign data source. Data source error: 
    %The foreign data source you are trying to reference does not exist. 
    Data source error:  : 2003 : Can't connect to MySQL server on '127.0.0.1' (111)

server2 (test) > SELECT count(*) 
    -> FROM information_schema.tables;
ERROR 1430 (HY000): There was a problem processing the query on the 
    foreign data source. Data source error: 
    %The foreign data source you are trying to reference does not exist. 
    Data source error:  : 2003 : Can't connect to MySQL server on '127.0.0.1' (111)

What happened? These queries don't use the federated tables (at least, that's what you might have thought while asking this simple question about table metadata). However, something is complaining about not being able to connect.

Restart server1 and repeat the operation.

$ ~/server1/start.sh
server2 (test) > SELECT table_name, engine 
    -> FROM information_schema.tables 
    -> WHERE table_schema=schema();
+------------+-----------+
| table_name | engine    |
+------------+-----------+
| test1      | FEDERATED |
| tkey_fed   | FEDERATED |
| tnokey_fed | FEDERATED |
+------------+-----------+

The general log reveals that there was a request sent to the linked server. Actually, there is one request for each table:

$ tail ~/server1/data/general.log
Time            Id Command  Argument
060610 23:43:33  1 Connect   server2usr@localhost on test
                 1 Query     SET NAMES latin1
                 1 Query     SHOW TABLE STATUS LIKE 'tkey'
                 2 Connect   server2usr@localhost on test
                 2 Query     SET NAMES latin1
                 2 Query     SHOW TABLE STATUS LIKE 'tkey'
                 3 Connect   server2usr@localhost on test
                 3 Query     SET NAMES latin1
                 3 Query     SHOW TABLE STATUS LIKE 'tnokey'

Rule number 4: Each INFORMATION_SCHEMA request related to a federated table involves a connection.

This rule looks like a bug, and perhaps it is. In the meantime, keep it in mind. If you are working with federated tables, be careful with your INFORMATION_SCHEMA requests.

Developer's angle: Unfortunately, it's a feature. It's the only way to get information about the table. For now, there's no way to avoid this problem.

Checking Whether A Federated Table Is Connected

Because federated tables strictly depend on active connections, it's a good idea to check whether the data is reachable before using a table. Here's a function that does just that, by attempting to query the information schema about a federated table and immediately checking whether it caught SQLSTATE 'HY000'.

CREATE FUNCTION fed_table_available( 
    p_table_schema varchar(50), 
    p_table_name varchar(50) )
RETURNS BOOLEAN
BEGIN
    DECLARE connection_dead BOOLEAN DEFAULT FALSE;
    DECLARE how_many int;
    DECLARE curx CURSOR FOR
        SELECT COUNT(*)
        FROM information_schema.TABLES
        WHERE
            TABLE_SCHEMA = p_table_schema
            AND TABLE_NAME = p_table_name;
    DECLARE CONTINUE HANDLER FOR SQLSTATE 'HY000'
        SET connection_dead = TRUE;
    OPEN curx;
    IF connection_dead THEN
        RETURN FALSE;
    ELSE
        FETCH curx into how_many;
        CLOSE curx;
        RETURN (how_many > 0);
    END IF;
END 

Using this method, your application can exit gracefully if a federated table is not available.

$ ~/server1/stop.sh
server2 (test) > select fed_table_available(schema(), 'tkey_fed');
+--------------------------------------------+
| fed_table_available( schema(), 'tkey_fed') |
+--------------------------------------------+
|                                          0 |
+--------------------------------------------+
1 row in set (0.05 sec)
$ ~/server1/start.sh
server2 (test) > select fed_table_available(schema(), 'tkey_fed');
+--------------------------------------------+
| fed_table_available( schema(), 'tkey_fed') |
+--------------------------------------------+
|                                          1 |
+--------------------------------------------+
1 row in set (0.05 sec)

Usage and Performance

Now that you know something more about the table connections, you can consider the interesting things, such as how the system performs when you actually use the data, which is the purpose of the whole exercise.

Honoring the WHERE Clause

Start by issuing a simple query, searching for a word that you may reasonably expect to find in the table.

server2 (test) > select * from tkey_fed where word = 'condor';
+-------+--------+
| id    | word   |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (0.08 sec)

The target server will dutifully execute exactly what you asked, as the general log shows.

$ tail ~/server1/data/general.log
060612 13:37:18  4 Query  SHOW TABLE STATUS LIKE 'tkey'
                 4 Query  SELECT `id`, `word` FROM `tkey` WHERE  (`word` = 'condor')

Now try something a tad more ambitious:

server2 (test) > select * from tkey_fed where word like 'condor%';
+-------+-----------+
| id    | word      |
+-------+-----------+
| 91569 | condor    |
| 91570 | condorcet |
| 91571 | condores  |
| 91572 | condors   |
+-------+-----------+
4 rows in set (0.09 sec)

...and the general log shows:

060612 13:41:47       4 Query       SHOW TABLE STATUS LIKE 'tkey'
                      4 Query       SELECT `id`, `word` FROM `tkey` WHERE  
  ( (`word` >= 'condor\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0') ) 
  AND ( (`word` <= 'condor') )

Surprise, surprise! What's this? This query looks wrong, even though the result is as you expected. There must be something more than that. In fact, if you capture the previous query through a packet sniffer, you will see the truth:

0000  a6 00 00 00 03 53 45 4c 45 43 54 20 60 69 64 60   .....SELECT `id`
0010  2c 20 60 77 6f 72 64 60 20 46 52 4f 4d 20 60 74   , `word` FROM `t
0020  6b 65 79 60 20 57 48 45 52 45 20 20 28 20 28 60   key` WHERE  ( (`
0030  77 6f 72 64 60 20 3e 3d 20 27 63 6f 6e 64 6f 72   word` >= 'condor
0040  5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30   \0\0\0\0\0\0\0\0
0050  5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30   \0\0\0\0\0\0\0\0
0060  5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30 5c 30   \0\0\0\0\0\0\0\0
0070  27 29 20 29 20 41 4e 44 20 28 20 28 60 77 6f 72   ') ) AND ( (`wor
0080  64 60 20 3c 3d 20 27 63 6f 6e 64 6f 72 ff ff ff   d` <= 'condor...
0090  ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff   ................
00a0  ff ff ff ff ff 27 29 20 29 20                     .....') )

That's better, as odd as it may look. The server will find the string condor, followed by any number of characters between 0x00 and 0xff (decimal 255). There might be some issue for Unicode support, but leaving that aside for now, it's a reasonably efficient query.

It's time to examine the ugly part. What happens when you issue the same query against the table without an index?

server2 (test) > select * from tnokey_fed where word = 'condor';
+-------+--------+
| id    | word   |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (2.07 sec)

You likely expected the query to be a little slower, but surely not that slow! Inspecting the logs reveals more:

$ tail ~/server1/data/general.log
060612 13:57:33  5 Connect  server2usr@localhost on test
                 5 Query    SET NAMES latin1
                 5 Query    SHOW TABLE STATUS LIKE 'tnokey'
                 5 Query    SELECT `id`, `word` FROM `tnokey`

$ tail ~/server1/data/slow_queries.log
# Time: 060612 13:57:35
# User@Host: server2usr[server2usr] @ localhost [127.0.0.1]
# Query_time: 2  Lock_time: 0  Rows_sent: 518584  Rows_examined: 518584
SELECT `id`, `word` FROM `tnokey`;

That's bad! The WHERE clause was irrelevant! As the slow queries log shows, the federated engine sent half a million rows across the network, leaving the burden of filtering it to the client. It was obvious that this query would trigger a full table scan (because there is no index on that column), and although you have to live with that, sending all the records across the network is a burden that's best avoided. What can you do?

Cheating With Bogus Indexes

The truth is, for reasons I can't figure out, federated tables honor the WHERE clause only if the linking table has an index. This looks like a bug to me, but there is a workaround: rebuild the federated table with a phantom key.

drop table if exists tnokey_fed; # remember: no ALTER TABLE support for federated tables
create table tnokey_fed ( 
    id int not null primary key, 
    word varchar(30) not null,
    KEY (word)
)
engine=federated 
connection='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tnokey';

Now test that assumption:

server2 (test) > select * from tnokey_fed where word = 'condor';
+-------+--------+
| id    | word   |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (0.64 sec)

This is much better. A table scan is slower than a direct index lookup (there are exceptions, but in this case the rule stays unchallenged). A look at the general log confirms this approach.

060612 14:12:10       7 Connect     server2usr@localhost on test
                      7 Query       SHOW TABLE STATUS LIKE 'tnokey'
                      7 Query       SELECT `id`, `word` FROM `tnokey` WHERE  (`word` = 'condor')

Thus, I can confidently state....

Rule number 5: The WHERE clause is honored only when using an index defined on the local server.

Do you want more evidence? Just rebuild tkey_fed without a key for the word column, and the previous query will be as slow as the initial one from tnokey_fed.

Developer's angle: The fix is to push down conditions for non-indexed columns, resulting in fewer table scans.

drop table if exists tkey_fed;
create table tkey_fed (
    id int not null auto_increment primary key,
    word varchar(30) not null
)
engine=federated
connection='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tkey';

server2 (test) > select * from tkey_fed where word = 'condor';
+-------+--------+
| id    | word   |
+-------+--------+
| 91569 | condor |
+-------+--------+
1 row in set (1.57 sec)
$ tail ~/server1/data/general.log
060612 14:19:07  9 Connect  server2usr@localhost on test
                 9 Query    SHOW TABLE STATUS LIKE 'tkey'
                 9 Query    SELECT `id`, `word` FROM `tkey`

Summing up on performance, the situation looks bad, but it's livable. Cope. Adding an index to a normal table is an expensive operation, but adding an index to a federated table is quite cheap, as stated in rule number two. Therefore, whenever you face a performance problem, you can cheat a bit.

Be aware that the index rule is quite crude. The federated engine will only use (according to my tests) one column in the WHERE clause, even if you create a compound index.

JOINs

The notes so far apply also to joins. The federated engine will grab all the data it needs to perform a join, and this can easily lead to a disaster if you don't plan your queries very carefully.

This next test uses the MySQL world sample database on server1 and three federated tables on server2.

server2 (world) > select City.Name 
    -> from City INNER JOIN Country ON Country.Code=City.CountryCode 
    -> WHERE Country.Name = 'Afghanistan';
+----------------+
| Name           |
+----------------+
| Kabul          |
| Qandahar       |
| Herat          |
| Mazar-e-Sharif |
+----------------+
4 rows in set (0.19 sec)

Remember that these are relatively small tables, with just a couple of hundred records in Country and a few thousand in City. If you have seen the world database before, you know that its tables have just a primary key and no secondary indexes. The general log will show some horrors behind the scenes.

$ tail ~/server1/data/general.log
20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'VUT')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'VAT')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'VEN')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'RUS')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'VNM')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'EST')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'USA')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'VIR')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'ZWE')
     20 Query  SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Code` = 'PSE')

Here tail will not be much help, besides telling you that the federated engine issued a bunch of queries from Country. You have to dig deeper to find out how many.

$ grep 'FROM `Country`'  ~/server1/data/general.log  | wc -l
232

$ tail -n 236 ~/server1/data/general.log | head -n 6
Time                 Id Command    Argument
     19 Query  SELECT `ID`, `Name`, `CountryCode`, `District`, `Population` FROM `City`
     20 Query  SELECT `Code`, `Name`, ... FROM `Country` WHERE  (`Code` = 'AFG')
     20 Query  SELECT `Code`, `Name`, ... FROM `Country` WHERE  (`Code` = 'NLD')

First it asked for all records from City. That's understandable because there is no key. Then, it created a query for each CountryCode found in City: 232 queries. For a simple join, it's quite expensive.

Once more, you can fix the problem by adding appropriate keys to both tables. An index on City.CountryCode and one on Country.Name will solve the problem.

server2 (world) > SELECT City.Name 
    -> FROM City INNER JOIN Country ON Country.Code=City.CountryCode 
    -> WHERE Country.Name = 'Afghanistan';
+----------------+
| Name           |
+----------------+
| Kabul          |
| Qandahar       |
| Herat          |
| Mazar-e-Sharif |
+----------------+
4 rows in set (0.04 sec)
$ tail -n 3 ~/server1/data/general.log
25 Query  SHOW TABLE STATUS LIKE 'Country'
     25 Query  SELECT `Code`, `Name`, ... FROM `Country` WHERE  (`Name` = 'Afghanistan')
     24 Query  SELECT `ID`, `Name`, ... FROM `City` WHERE  (`CountryCode` = 'AFG')

The outcome was just two queries, both using a WHERE clause. Still, the engine is not smart enough to make one query only, but that's better than nothing. Another improvement is to add a real index on server1, although that is beside the point; the main problem in this case is not the speed of record retrieval, but the amount of data passed across the network.

Developer's angle: A bug fix is on its way for join performance.

Using Aggregate Functions

Here comes the really bad news. No matter how much cheating, when using COUNT, SUM, MAX, and all other aggregate functions, the federated engine will read all of the records.

Consider again the first example, with half a million words. Try a seemingly harmless query:

server2 (test) > select count(*) from tkey_fed;
+----------+
| count(*) |
+----------+
|   518585 |
+----------+
1 row in set (2.37 sec)

If you think it's always a good idea to do a SELECT COUNT(*) before attempting a potentially demanding query, think again.

This query took much longer than expected, and as always, the log inspection shows why:

$ tail ~/server1/data/general.log
060612 20:50:24  2 Query  SHOW TABLE STATUS LIKE 'tkey'
                 2 Query  SELECT `id`, `word` FROM `tkey`

Disaster! Five hundred thousand records went across the network just to get a count? There must be good reasons for such behavior, but from the user's point of view, it's really unacceptable. There is a workaround coming, but for now, it's enough to note....

Rule number 6: Aggregate functions with federated tables are extremely expensive.

However, don't lose heart just yet. Some relief is coming your way shortly.

Federating Different Things

So far, I have shown federated tables on a peer-to-peer basis; that is, as a link between two tables in two different MySQL 5.0 servers. That may be the most frequent case, but it does not have to be. There are other objects that you can federate, provided that they look like a MySQL 5.0 table.

Federating Tables On Older MySQL Servers

One thing that looks like a MySQL 5.0 table (in the federated engine's limited view) is a MySQL 4.x or 3.23 table. Because the federated table course of action is currently quite crude, any table on the other side would do, provided that it belongs to a server that can answer to the federated engine basic requests.

If you don't have a 3.23 server hanging around, install one through the MySQL Sandbox:

$ ./express_install /opt/mysql/3.23.58 \
    -d server3 -P 3230 -c log=general.log \
    -c log-slow-queries -c log-long-format

Then, after loading the MySQL world sample database on server3, make three federated tables from server2, as you did for server1. (Don't forget to add indexes for City.CountryCode and Country.Name.)

Now you can enjoy the unique pleasure of issuing a nested query on a couple of MySQL 3.23 tables and getting a meaningful result:

server2 (world) > SELECT * 
    -> FROM City 
    -> WHERE CountryCode = (
    -> SELECT Code FROM Country WHERE Name = 'Afghanistan');
+----+----------------+-------------+----------+------------+
| ID | Name           | CountryCode | District | Population |
+----+----------------+-------------+----------+------------+
|  1 | Kabul          | AFG         | Kabol    |    1780000 |
|  2 | Qandahar       | AFG         | Qandahar |     237500 |
|  3 | Herat          | AFG         | Herat    |     186800 |
|  4 | Mazar-e-Sharif | AFG         | Balkh    |     127800 |
+----+----------------+-------------+----------+------------+
4 rows in set (0.01 sec)

The federated engine did you a favor, turning your nested query into something that the old engine understands:

$ tail ~/server3/data/general.log
 1 Connect  server2usr@localhost on world
 1 Query     SHOW TABLE STATUS LIKE 'City'
 2 Connect  server2usr@localhost on world
 2 Query     SHOW TABLE STATUS LIKE 'Country'
 2 Query    SELECT `Code`, `Name`, ...  FROM `Country` WHERE  (`Name` = 'Afghanistan')
 1 Query    SELECT `ID`, `Name`, ... FROM `City` WHERE  (`CountryCode` = 'AFG')

Rule number 7: You can federate any MySQL table, regardless of the server version.

Think about the possibilities. You can federate any existing MySQL table and use this link to suit several purposes. For example, you can consolidate data from several sources into a data warehouse, or you can just check from a centralized point whether all of your servers are working properly.

Federating Views

Another thing that looks like a table, from the particular angle of the federated engine, is a view. You can cheat the engine into linking to a view instead of a table, which opens a whole new world of opportunities.

Rule number 8: You can federate a view instead of a table.

For instance, you can mitigate the terrible performance of aggregating functions by using views. First, create a view that simply counts tkey records:

server1 (test) > CREATE VIEW tkey_count AS SELECT COUNT(*) AS how_many FROM tkey;
Query OK, 0 rows affected (0.00 sec)

Then, on the second server, federate that view as if it were a table:

create table tkey_count_fed 
(how_many int) 
ENGINE=FEDERATED 
CONNECTION='mysql://server2usr:datacharmer@127.0.0.1:5001/test/tkey_count';

Now try it. This time the count comes in no time at all.

server2 (test) > SELECT * FROM tkey_count_fed;
+----------+
| how_many |
+----------+
|   518584 |
+----------+
1 row in set (0.00 sec)
server2 (test) > INSERT INTO tkey_fed VALUES (NULL, 'this_was_not_there');
Query OK, 1 row affected (0.00 sec)

server2 (test) > SELECT * FROM tkey_count_fed;
+----------+
| how_many |
+----------+
|   518585 |
+----------+
1 row in set (0.00 sec)

Your satisfaction will be almost total when the general log shows that the request did not trigger any unpleasant side effects.

$ tail ~/server1/data/general.log
                 8 Query  SHOW TABLE STATUS LIKE 'tkey_count'
                 8 Query  SELECT `how_many` FROM `tkey_count`
060613 21:18:24  4 Query  INSERT INTO `tkey` (id, word) VALUES  (0, 'this_was_not_there')
060613 21:18:28  8 Query  SHOW TABLE STATUS LIKE 'tkey_count'
                 8 Query  SELECT `how_many` FROM `tkey_count`

Still, there is room for improvement. This solution does not allow a WHERE clause for the COUNT query. All worries should dissipate when you read the next section.

Executing Remote Commands With Federated Tables and VIEWS

It's time to open the Wizardry department and give the federated engine a workout.

You can execute predefined functions on a remote server by implementing a simple framework based on a federated table, a view, and a few support tables.

On server1, create:

On server2, create:

The full implementation of the framework requires some more components to allow different users to execute remote commands concurrently, and some additions to allow the execution of arbitrary commands, but I won't burden you now with such details. I plan to describe the full framework in a separate article.

Rule number 9: You can exploit the federated engine to execute remote commands.

Half Baked? A Glimpse At Future Developments

Finally, I should mention that there are plans for improving the federated engine in future releases of MySQL.

This knowledge comes from attending a presentation at the MySQL Users Conference 2006, where Patrick Galbraith explained that the plan is to implement a new servers table to hold the connection parameters for a remote server. Unfortunately, this implementation won't allow a name for the table being linked, which will have the same name as the local one. This fact will make the usage of federated tables much harder in some situations where you need to have, in the same database, a local table X and a link to table X on one or more remote servers. I appeal to the MySQL developers to implement this feature in the right way. For example, something along the lines of:

CREATE SERVER server_def (
host     : 'my.host.mydomain.org',
port     : 3306,
user     : 'uname',
password : 'mysecret',
database : 'my_db_name'
);

CREATE TABLE xyz ( ... )
ENGINE = FEDERATED CONNECT TO 'table_name' USING server_def;

Such an enhancement would yield the clear syntax combined with the freedom of defining a table name.

Developer's angle: There will also be a foreign tables system table, so this shouldn't be a problem anymore.

Summing Up

Federated tables are one great feature in MySQL. With the right amount of documentation and test units, it can really make a difference, turning MySQL into a distributed DBMS.

Thanks

Thanks to Roland Bouman and Beat Vontobel for reviewing this draft and providing useful advice. In particular, I owe Roland for an improvement on the fed_table_available function.

Thanks to Patrick Galbraith for answering some questions about the future development of this engine.

Giuseppe Maxia is a QA developer in MySQL community team. A system analyst with 20 years of IT experience, he has worked as a database consultant and designer for several years. He is a frequent speaker at open source events and he's the author of many articles. He lives in Sardinia (Italy).


Return to the Databases.

Copyright © 2009 O'Reilly Media, Inc.