How to produce random rows from a table

Posted on November 28, 2009, 10:10 pm, by Hazan Ilan, under Cool MySQL Queries.

A while ago, I were searching for a way to produce random rows from a table in MySQL. I found several solutions but none of them satisfied me. Of course, I could use a combined logic of MySQL and a programming language. For example by producing random numbers in PHP and using them in the MySQL query in a IN clouse. However, I was looking for a pure query solution. The easy way that MySQL provides is by using ORDER BY RAND(). This trick works fine for very small tables. However, if you have a big table, this solution is a bad choice. It has a big performance issue as it orders all the table rows prior to selecting a few of them.

My solution to this problem is using User Defined Variables. Let’s say we want X random rows from a table myTable.

SELECT myTable.* FROM (SELECT FLOOR (RAND() * (SELECTcount(*) FROM myTable)) num ,@num:=@num+1 from (SELECT @num:=0) a , myTable LIMIT X) b , myTable WHERE b.num=myTable.id;

I will try to explain some of the parts:

FLOOR (RAND() * (SELECT count(*) FROM myTable)) num – produce an integer value between 0 to the max rows this table has
(SELECT @num:=0) a – initialize the User Defined Variable a to zero
(SELECT FLOOR (RAND() * (SELECTcount(*) FROM myTable)) num ,@num:=@num+1 from (SELECT @num:=0) a , myTable LIMIT X) b – produces X random integer numbers

If you need it with range, say X random rows from a table myTable with a range from S to T you can write it like this:

SELECT myTable.* FROM (SELECT FLOOR (S + RAND() * (T – S)) num ,@num:=@num+1 FROM (SELECT @num:=0) a , myTable LIMIT X) b , myTable WHERE b.num=myTable.id;

Some caution: because this query uses User Defined Variable (@num) which are connection specific, you can NOT use this MySQL query with a system that uses connection pooling or persistent connections.

Comments are welcomed,
Hazan Ilan.

Comment (RSS) | Trackback

19 Comments

Eran Binkin says:
November 29, 2009 at 3:03 pm
Thanks!
Avi says:
November 30, 2009 at 1:33 am
Cool!
Keep it up, man!
Jonas Birgander says:
November 30, 2009 at 6:34 am
Nice solution, but it wont work if your id numbers have gaps in them.
It would be nice if MySQL could:
* Let the query analyzer identify ORDER BY RAND() and make it quick
or
* Add an option allowing us to write queries like SELECT SQL_RAND column1, column2… FROM table WHERE a=b LIMIT 5
Here’s some SQL to show that it wont work with gaps:
CREATE TABLE testing_rand (
id integer not null,
primary key (id)
);
INSERT INTO testing_rand(id) VALUES (1), (70), (90), (100), (10000);
SELECT testing_rand.*
FROM (
SELECT FLOOR (RAND() * (SELECT count(*) FROM testing_rand)) num, @num:=@num+1
FROM (SELECT @num:=0) a , testing_rand
LIMIT 3
) b , testing_rand WHERE b.num=testing_rand.id;
Jason says:
November 30, 2009 at 8:49 am
Very interesting idea. I wonder how the performance is compared with the more common implementation of code + query.
Hazan Ilan says:
November 30, 2009 at 1:32 pm
If there are some holes in the table you can fix it by:
SELECT (SELECT myTable2.id from myTable myTable2 where myTable2.id>=b.num LIMIT 1) c FROM (SELECT FLOOR(RAND() * (SELECTcount(*) FROM myTable)) num ,@num:=@num+1 from (SELECT @num:=0) a , myTable LIMIT X) b ;
To improve performance you can combine this solution with a left join:
SELECT IF(c.id is null, (SELECT c2.id FROM myTable c2 WHERE c2.id > b.num LIMIT 1), c.id) d FROM (SELECT FLOOR(RAND() * (SELECTcount(*) FROM myTable)) num ,@num:=@num+1 FROM (SELECT @num:=0) a , myTable LIMIT X) b LEFT JOIN myTable c ON( c.id=b.num);
Bill Karwin says:
November 30, 2009 at 3:04 pm
The solution to account for gaps produces results weighted toward id values that follow gaps. In other words, if no rows with id 400-405 exist, then id 406 will be returned for any random number between 400 and 406.
Also you assume that id starts at 0. For example, if you have 1000 rows but the least id value is >1000, no random value between 0 and COUNT(*) will ever match any id in your table.
Also I’m not sure why you are setting a User Variable at all. You make no use of the variable in your query. I tried a test with your query and it still functions if I omit the expressions that initialize and increment @num.
Richard Lynch says:
November 30, 2009 at 9:27 pm
What I find simplest is to add an indexed column:
alter table myTable add static_rand int(11) default null;
create index myTable_rand_index on myTable(static_rand);
Then I initialize the values to random numbers in batches of, say, 10000:
update myTable set static_rand = 1000000 * rand() where static_rand is null limit 10000;
Repeat until there are no NULL values.
I can then do:
select id from myTable order by static_rand limit 10;
//get the id values in $VALUES
update myTable set static_rand = 1000000 * rand() where id in ($VALUES)
So basically, it “shuffles” the records to start with, and then pops the required set off the top of the deck, and moves the “used” part of the deck back to random locations within the deck.
This works well for quite large datasets, as the static_rand index fits into RAM and the record set being altered is generally quite small and fast to update.
Odds are quite good, actually, that you’ll be doing something else with the randomly-selected records anyway, so they’ll be in RAM as a working set usually.
Invariably, this is all wrapped up in application logic with a single entry point anyway, and the random constraint “feels” more like an application business requirement than a DB layer, so I have no problem doing this in the application layer…
But I’m sure a mysql guru could re-work it easily enough into an SPROC though.
I personally find this to fit KISS (Keep it simple, stupid) best of all the solutions I’ve tried over the years.
ymmv
Michael J says:
March 2, 2010 at 12:09 am
Perfect solution. Thank you
Bernarda Ocon says:
April 29, 2010 at 1:30 am
Wow… interesting information.
Amsterdam Escort says:
April 29, 2010 at 3:44 pm
I enjoyed viewing your blog and I will be back to check it more in the future so please keep up your good quality work. I love the colors that you chose, you are quite talented!
bdwankhede says:
May 3, 2010 at 9:28 pm
i have a query on Firebird sql as
SELECT FIRST 1000 i.”id”, i.”slug”, i.”what_they_do”, i.”does_well”, i.”last_improves” FROM “inspections” i LEFT JOIN “services” s on i.”service_id” = s.”id” WHERE s.”active” = 1 AND s.”suspended” = 0 AND i.”report_date” > ‘2005-04-01’ ORDER BY RAND() ASC;
and it executes very slow , Please help…
Hazan Ilan says:
May 10, 2010 at 2:09 am
Hi bdwankhede,
1. As far as I know (correct me if I’m wrong), there is no “FIRST” aggregate function in MySql, so you need to check the performance of it.
2. Please send me the Explain.
3. If “inspections” table is a big one, the ORDER BY RAND() is known to be a performance consumer. Please tell how much rows you have.
4. You can try selecting 1000 random records from “inspections” first and LEFT JOIN only on the results.
Waiting for your response
Ilan.
Bruce says:
May 20, 2010 at 9:26 pm
Wow… interesting information.
Justin says:
November 15, 2010 at 11:51 pm
I’ve another solution, the problem I had with the solutions presented was extremely large and ramdom gaps in my index field, and the need to select not just one random row, but 50, and the need to use a where clause as well (selecting random records matching something else or other in the table). order by rand works but gets very slow when the table is huge and requires a sort.
My solution is to create a temporary table, which is effectively an view of the key field of the target table with a primary key that is auto_indexed. You can create this table in one like “create temporary view_table (id int auto_increment primary key, i) as select (i) from big_table where …”
then I create a list of random numbers between 0 and the number of rows in that temporary table and then do a join
select blah from big_table,view_table where view_table.id IN ($list) and view_table.i = big_table.i
and that’s it. Very fast.
Now and again (daily, hourly, whatever) drop the temporary table and re-create it.
Michael says:
January 14, 2011 at 9:08 am
The workaround for the ORDER BY RAND()
its to add in where clause Rand
Example:
select * from mytbl where RAND()>0.9 ORDER BY RAND()
RL says:
February 5, 2011 at 10:39 pm
I assume you used MyISAM tables for this solution? MyISAM stores the row counts in meta-information for each table… using COUNT(*) on other database engines may be a lot slower.
sagi baron says:
February 15, 2011 at 6:05 am
You can get a very efficient way to produce random rows from MySQL table by using the RAND function in the WHERE clause and not in the ORDER BY clause.
It is returning random rows very fast and there is no need to worry about holes in the table’s ids:
See how to select random rows in mysql
Amit S says:
July 31, 2011 at 10:19 am
Interesting solution. I was wondering my this algorithm is not the native implementation in the ORDER BY RAND() function in MySql.
Christopher Thomas says:
March 10, 2012 at 10:44 am
I just found this solution: http://forums.mysql.com/read.php?132,185266,194715
and it seems to work fine.
SELECT * FROM Table T JOIN (SELECT FLOOR(MAX(ID)*RAND()) AS ID FROM Table) AS x ON T.ID >= x.ID LIMIT 1;
and a PHP version of that might be
function selectRandom($table,$pkName,$count)
{
return mysql_select(“* FROM $table JOIN (SELECT FLOOR(MAX($table.$pkName)*RAND()) AS RID FROM $table) AS x ON $table.$pkName >= x.RID LIMIT $count;”);
}
that will return you a number of rows selected pseudo-randomly from your db. maybe the formatting will suffer a little bit from the website ? but it’s what I use in my code…