AddThis Social Bookmark Button

Print

How to Optimize Rank Data in MySQL
Pages: 1, 2, 3, 4, 5, 6, 7

How to maintain the rank column

So far I've shown you how to initialize the rank column, but not how to maintain it when scores change. The good news is, the user-variable trick works in UPDATE statements too. You can't normally place user-variable assignments in a SET clause, but you can if they're inside a function:



set @rank := 0, @pos := 0, @game := null, @score := null;

update score_ranked
set rank_in_game = 
   greatest(
      @rank := if(@game = game and @score = score, @rank, if(@game <> game, 1, @rank + 1)),
      least(0, @pos   := if(@game = game, @pos + 1, 1)),
      least(0, @score := score),
      least(0, @game  := game)),
   pos_in_game = @pos
order by game desc, score desc;

This will update the entire table in one pass. If you need to update every row, this is as good as it gets.

This query performs the kind of logic you can do in other database products with cursors. MySQL 5 has server-side cursors, but they're read-only. This query simulates a write cursor.

Update Scenarios

You don't always have to update the whole table. I can think of at least four major scenarios, each of which has a separate optimization.

Scenario 1, the best case, is when the change doesn't affect any existing data. For example, suppose I want to delete gamer 1010's score for game 1. There is another gamer with the same score, so the rankings do not change. Inserting a new record for game 1 with the same score won't change them either.

Scenario 2 is where only a single row needs an update. For example, suppose gamer 1010's score for game 1 decreases to 97084. There's an existing row with that same score, so gamer 1010's score and rank will change, but nothing else will. Another single-row scenario is when a row's score goes up or down, but not enough to reach its nearest "neighbor." For example, look at game 1 scores for ranks 291 through 294. If gamer 2805's score increases to 97070, no other rows are affected; gamer 2805 is still at rank 294:

+-------+------+-------+--------------+
| gamer | game | score | rank_in_game |
+-------+------+-------+--------------+
|  9094 |    1 | 97084 |          291 |
|  7462 |    1 | 97076 |          292 |
|  4839 |    1 | 97075 |          293 |
|  2805 |    1 | 97062 |          294 |
+-------+------+-------+--------------+

Scenario 3 is when an internal range of rows need updates. For example, suppose gamer 4839's score increases to 97080. This moves gamer 4839 into rank 292, and pushes gamer 7462 down one rank to 293, but it affects only those two rows. If gamer 2805's score had increased to 97078, several rows would move down in the ranks, but the change would still be localized.

Finally, an update could affect every row with a lower score for that game "from here to the end of the rank." For example, if gamer 1010's score increased to 97102, she would still be in position 290, but every gamer with a lower score would decrease by 1. This is what I call an "expensive" update. In the worst case, all records for a game would have to be updated.

It's easy to detect each of these scenarios. You just need a few simple queries for each kind of operation: inserts, deletes, and updates. The first query determines if the operation will be expensive, and other queries handle whatever work you decide to do as a result. For example, this query will determine if it will be expensive to insert a new gamer with score 97101 for game 1:

select count(*) as cheap,
   coalesce(
      avg(rank_in_game),
      (
         select rank_in_game
         from score_ranked use index(score)
         where game = 1 and score < 97101
         order by score desc limit 1
      )
   ) as rank
from score_ranked where game = 1 and score = 97101;

That query returns a non-zero value for cheap, so it will be a cheap operation. If there had been zero rows, the subquery would tell where the beginning of the open-ended range is, and it would be up to you to decide (hopefully with some statistics) whether it's an expensive range to update. If the new score were very low, which is likely because presumably a lot of INSERTs come from new gamers, it would probably go near the end of the ranking, so only a few rows need updating.

You can update scores and ranks at the same time with standard SQL queries that have no user-variables. For example, if you're only increasing gamer 2805's score to 97085, and your initial query determined it'll "bump" three other scores out of position, you can write the update like:

update score_ranked
   set score = if(gamer = 2805, 97085, score),
       rank_in_game = if(gamer = 2805, 291, rank_in_game + 1)
where game = 1 and rank_in_game between 291 and 294;

Similar queries are easy to write for other cases.

These optimizations work well if you do single-row updates, but some queries are expensive, and will require you to update more of the table--perhaps the whole table at once. It's probably better to postpone these and do them in a batch. When you do, the user-variable query saves the day. It makes the whole batch about as efficient as an individual "expensive" query. Without user-variables, you will have to do at least some updates with slower methods.

Pages: 1, 2, 3, 4, 5, 6, 7

Next Pagearrow




-->