AddThis Social Bookmark Button

Print

How to Optimize Rank Data in MySQL
Pages: 1, 2, 3, 4, 5, 6, 7

How Efficient Is This Compared to the Alternatives?

The strategy I just showed you takes one pass through the table. That is, it's an O(n) algorithm in the size of the table (as I showed, the best case is O(1), but the worst case is bounded above by n). This is as good as you can get when you have to calculate a rank for each row.

The alternatives I know of are quadratic, or O(n2) in the size of the table. For example:

-- This query is illegal in MySQL, but even if it weren't,
-- it would take a very long time to run.
update score_ranked as o
   set rank_in_game = (
      select count(distinct score) from score_ranked as i
      where i.score >= o.score and i.game = o.game);

-- This is legal, but creates an un-indexed temp table (even worse!)
-- inside the subquery.  Don't try this at home!
update score_ranked as o
   set rank_in_game = (
      select count(distinct score) from (select * from score_ranked) as i
      where i.score >= o.score and i.game = o.game);

-- This is still a cross join because of the <=
update score_ranked as o
   inner join (
      select l.game, l.score, count(distinct r.score) as rank
      from score_ranked as l
         left outer join score_ranked as r on l.game = r.game
            and l.score <= r.score
      group by l.game, l.score
   ) as i on o.game = i.game and o.score = i.score
   set o.rank_in_game = i.rank;

If that were the best you could do in SQL, what are your other options? One is to simulate cursors in a programming language like PHP. This means selecting the entire table and looping through the rows, calculating each row's rank and sending an UPDATE back to the MySQL server. I advise against this. It's not quadratic, but it's death by 100,000 cuts.

Failing that, you have to abandon rank columns, and settle for all the COUNT() queries and limits with offsets. In my experience as a consultant, this is the path most people take, and it doesn't scale. If it did, they wouldn't have called me for help.

Think about it this way: the leaderboards (reads) for Design #1 are O(n) in the size of the table, and writes are O(1). If I eliminate COUNT and LIMIT with OFFSET queries, the reads are O(1) and writes are O(n). If I postulate that there are many more reads than writes, I've probably made the right decision.

Does the Rank Column Really Help?

It's not enough to just rely on gut feeling, look at query plans, or run queries and see how long they take. Profiling and benchmarking are essential to understanding whether "optimizations" actually help. The first thing I always do is profile the queries on an otherwise quiet machine so I can measure how much work they really do. This is why I wrote the MySQL Query Profiler.

I profiled the three sets of queries to test the leaderboards. Because gamer 1010 is fairly high in the rankings in those leaderboards, I also profiled gamer 755, who is very low in the rankings, to get a sense of the two extremes. Here is a sampling of the most relevant results:

Design Design #1 Design #2 Design #3
Gamer 1010 755 1010 755 1010 755
Time 0.03 0.2 0.05 0.2 0.2 0.01
Queries 4 4 3 3 2 2
Optimizer Cost 21793 21793 5177 5163 2 2
Table locks 5 5 5 5 3 3
Table scans 1 1 2 2 1 0
Index range scans 2 2 1 1 0 1
Rows sorted 320 10000 40 18 20 8
Rows read 30943 59988 20403 20051 27482 39
Fixed reads 320 10000 20 9 20 8
Next-row reads 20002 20002 42 20 8728 9
Bookmark lookup 10010 10010 32 22 8734 14
Full scans initiated 1 1 0 0 0
Next in index 610 19975 20309 20000 10000 8
Temp table inserts 10000 10000 40 18 8727 8

It looks like some of the queries might not be using indexes the same way on the two gamers, but I don't want to get into early optimization. I just want to look for a general improvement with the ranking columns. The profiling data shows why it's not a good idea to rely only on query plans and execution times; the query times aren't that different, but the other numbers show how different the workload is.

Pages: 1, 2, 3, 4, 5, 6, 7

Next Pagearrow




-->