User Defined Variables

When I have discovered MySQL User-defined variables at the first time, I didn’t pay much attention to them and didn’t realize their strength. At first, one can think that they are just one of the many unnoticed MySQL features. Well, I was wrong. Here, I will try to summarize their strength, though I think there is much more to explore.

By using User-defined variables, one can add some kind of procedural logic to the MySQL’s relational logic. They are connection specific variables, that means that they persist as long as the connection to the database persists and are specific to a connection (the variable value is not accessible via different connections). User variables can be assigned a value with a SET or SELECT statements and are accessible written as @var_name.

Before we dive into the fascinating uses and examples I would like to point some cautions with MySQL User Defined Variables:

  • Case sensitivity: Currently they are case sensitive (they are case insensitive at older versions >5.0).
  • They prevent query cache.
  • If you refer to a variable that has not been initialized, it has a value of NULL and a type of string.
  • The order of user defined evaluation is undefined.

So lets see some examples:

Example 1:
The first example is the simplest one. We would like to numerate the MySQL query result. Lets say we have a table called users and we would like to numerate some result from it:

SELECT
    @counter:=@counter+1 as counter,
    users.*
FROM
    (SELECT @counter:=0) v,
    users;

Example 2:
How can you produce only the 100th rows? Can you do it in a pure relational logic? I don’t think so. However, this is possible and easy to do with the User defined variables:
Lets say we have a table called users.

SELECT
    users.*
FROM
    (SELECT @counter:=0) v, 
    users
HAVING
    (@counter:=@counter+1)%100=0
ORDER BY
    user_id;

This query will print only the 100th users from that table. The statement (select @counter:=0) initialize the @counter variable to zero. This is not the only way to do it, we could wrote a prior statement to set the variable to zero:

SET @counter:=0;
SELECT
    users.*
FROM
    users
HAVING
    (@counter:=@counter+1)%100=0
ORDER BY
    user_id;

The statement having (@counter:=@counter+1)%2=0 has two operation: the first is to increment the variable and the second is to return true value for the 100th records.

Example 3:

It is quit simple to produce an accumulative totals with User-defined variable:

Say we have a table of bank account transactions by year and month.  To produce a bank account transaction report by months we can use:

SELECT
   t.year,
   t.month,
   @x:=@x+t.c
FROM
   (SELECT @x:=0) a,
   (SELECT
        year,month,sum(amount) AS c
    FROM
        account_transactions
    GROUP BY year,month) t

That is it for now.
I will be glad to receive more interesting examples that uses MySQL’s User Defined variables.

Tip: If you liked this post I am recommending reading also the folowing great post: Advanced MySQL user variable techniques

Bookmark and Share

Limited SELECT count(*)

A SELECT statement with COUNT returns the number of rows retrieved by the SELECT statement. For performance, the desired result is to limit that count.

Including a LIMIT clause in the SELECT statement will not work since it only restricts the number of rows returned, which is always one.

The solution, what I call “Limited-Count”, is done by limiting a non-count SELECT statement and wrapping it in COUNT(*).

For example:
If your count statement looks like:

SELECT COUNT(*) FROM a_table WHERE some_conditions;

You can limit the results by moving the query into:

SELECT COUNT(*) FROM (SELECT 1 FROM a_table WHERE some_conditions LIMIT x) t;

Notes:
1. COUNT(*) is optimized to return very quickly if the SELECT retrieves from one table, no other columns are retrieved, and there is no WHERE clause (this is true only for MyISAM tables. For InnoDB tables, it’s slow – thanks for Roland Bouman’s comment)
2. If you need to know how many rows your non-count SELECT statement would have returned without the LIMIT, you could consider using the information function, FOUND_ROWS(). It will fetch the total rows number without running the statement again.

Bookmark and Share

Conditional Joins in MySQL

One way to do a “Conditional Join” in MySQL is by using a “LEFT JOIN”.  Create a “LEFT JOIN” for each condition and combine the results into one column using an “IF” statement by the “SELECT” expression.  Here’s an example:

Suppose you have three tables:

  1. questions: a table consisting of question ids, timestamps, and whether or not the question has been answered.
  2. answered_questions_title: a table of answered question titles.
  3. unanswered_questions_title: a table of unanswered question titles.

(yeah, I know the example isn’t 100% realistic, but it’s only for demonstration purposes ;-) )

 Now, suppose you need to find the 50 latest arrival questions with their titles.  We will need a conditional join that combines either answered or unanswered titles with the questions table.

 CREATE TABLE questions(
                q_id INT UNSIGNED NOT NULL AUTO_INCREMENT,
                q_is_answered TINYINT(1) UNSIGNED,
                q_timestamp TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP
);

 CREATE TABLE answered_questions_title(
                q_id INT UNSIGNED,
                q_title VARCHAR(255)
);

 CREATE TABLE unanswered_questions_title(
                q_id INT UNSIGNED,
                q_title VARCHAR(255)
);

The query would be:

SELECT
       q.q_id,
       q.q_timestamp,
       IF(q.q_is_answered, a.q_title, u.q_title) as q_title
FROM
      questions q
      LEFT JOIN answered_questions_title a ON (q.q_id = a.q_id AND q.is_answered = 1)
      LEFT JOIN unanswered_questions_title u ON (q.q_id = u.q_id AND q.is_answered = 0);

Explanation:
By inserting the “is_answered” condition to the ON clause, we restrict the “LEFT JOIN” to this condition.  The result contains the desired values with the matching condition, and null values elsewhere.  The title is now spread between two columns: a.q_title, and u.q_title.  We use the IF clause to combine them into one column, q_title.

Note:
Another possible way to do a “Conditional Join” using MySQL is with a “UNION” (a “UNION” is the result of two or more conditions results).  However, using “Union” is bad for performance when you need only top results with an “ORDER BY” and “LIMIT”.

I’m always happy to receive your comments,
Ilan Hazan

Bookmark and Share

Selecting records from one table that dont have specific relationship in a second table

Question: How can you select records from one table that doesn’t have a specific ID in a second, many-to-many table?  

In other words, imagine you have a questions table (A), categories table (B), and a many-to-many relationship link table (L).  This setup allows a question to have several categories.  How can you find questions that don’t have a specific category (say X)? 

What about the following query? 

               SELECT A.* FROM A JOIN L USING (question_id) WHERE L.category_id <> X;  

This query will not meet the requirements because it will still return questions that have several categories in addition to the X category, since it will only filter the X value.  

My way to handle this is to use the MySQL ‘NOT IN’ clause. 

                SELECT A.*, L.* FROM A JOIN L USING (question_id) WHERE A.question_id NOT IN (SELECT question_id FROM L WHERE category_id IN (X));  

This query will retrieve all questions and related categories that don’t belong to category X.  

Comments and/or different solutions are welcome.
Ilan.

Bookmark and Share

The fading of Query Cache limitations

There is no question about the MySQL Query Cache greatness. 

The MySQL Query Cache feature started his way with some limitations.  However, in time, there were a lot of improvements (fixing) for those limitations.

One should test his MySQL version with the Query Cache limitations before using it.

Here are some improved limitations I found (at MySQL 5.0 version and above):

  1. The limitation:  Query comments are used as part of the query cache hash key. This means that different comments will (probably) produce different  query cache hash keys.
    The improvement: The existence of the comment in the query is important but not their contents. This means that even empty comment affects the query cache hash key, but two same queries that different with their comment content will have the same query cache hash key.
  2. The limitation:  Queries that begin with spaces or comments are ignored by the Query Cache mechanism.
    The improvement: The Query Cache mechanism ignores spaces and comments at the beginning and at the end of the query.
  3. The limitation:  Query Cache does not support prepared statements.
    The improvement: MySQL Query Cache does work well with prepared statements.

However, MySQL Query Cache still have the following “textual changes” limitations:

  1. It is case sensitive – changing even one letter case will produce different hash key.
  2. Extra spaces within the query (not at the beginning) will produce different hash key.  

 I would love to hear from you about more improvements to the MySQL Query Cache mechanism.
Ilan Hazan.

Bookmark and Share

How to retain comments in MySQL statements

There is a lot of buzz lately around retaining comments in MySQL’s logs and MySQL Stored procedures (Routines).
When analyzing a MySQL query, it’s nice to have a comment inside the query (or procedure) that can provide us with more info about the query.
For example, When analyzing MySQL’s logs (like Slow Query Log or General Log), seeing the calling function’s name that issued this slow query can help a lot. This way it’s much easier to find the query in the code or if it’s called from a script.
One solution is to write an inline comment (C multi column comment style) in the query’s body.
For example: “SELECT /* calling function name */ column1, column2 … FROM table …”.
However, when I tried to use it within a bash script the comment was stripped out.
I found that the MySQL client program is stripping out in-query (or in-procedure) comments.
Fortunately, there is a flag that controls this behavior whose default is to discard comments (–skip-comments).
 
To retain (preserve) comments in MySQL statements sent to the server (i.e. queries or MySQL Stored procedures) use the -–comments flag (or -c).
For example $ bin/mysql  –comments  -uxxxx  -pyyyy
 
Important notes:
1.     This flag is active from MySQL version 5.0.52. (see http://mirror.leaseweb.com/mysql/doc/refman/5.0/en/mysql-command-options.html)
2.     The comments are retained within the scripting language with the embedded SQL libraries (such as PHP).

Bookmark and Share

How to produce random rows from a table

A while ago, I were searching for a way to produce random rows from a table in MySQL. I found several solutions but none of them satisfied me. Of course, I could use  a combined logic of MySQL and a programming language. For example by producing random numbers in PHP and using them in the MySQL query in a IN clouse.  However, I was looking for a pure query solution. The easy way that MySQL provides is by using ORDER BY RAND(). This trick works fine for very small tables. However, if you have a big table, this solution is a bad choice.  It has a big performance  issue as it orders all the table rows prior to selecting a few of them.

My solution to this problem is using User Defined Variables. Let’s say we want X random rows from a table myTable.  

SELECT myTable.* FROM (SELECT FLOOR (RAND() * (SELECTcount(*) FROM myTable)) num ,@num:=@num+1 from (SELECT @num:=0) a , myTable LIMIT X) b ,  myTable WHERE b.num=myTable.id;

I will try to explain some of the parts:

  1. FLOOR (RAND() * (SELECT count(*) FROM myTable)) num  - produce an integer value between 0 to the max rows this table has
  2. (SELECT @num:=0) a – initialize the User Defined Variable a to zero
  3. (SELECT FLOOR (RAND() * (SELECTcount(*) FROM myTable)) num ,@num:=@num+1 from (SELECT @num:=0) a , myTable LIMIT X) b – produces X random integer numbers

If you need it with range, say X random rows from a table myTable with a range from S to T you can write it like this:

SELECT myTable.* FROM (SELECT FLOOR (S + RAND() * (T – S)) num ,@num:=@num+1 FROM (SELECT @num:=0) a , myTable LIMIT X) b ,  myTable WHERE b.num=myTable.id;

Some caution: because this query uses User Defined Variable (@num) which are connection specific, you can NOT use this MySQL query with a system that uses connection pooling or persistent connections.

Comments are welcomed,
Hazan Ilan.

Bookmark and Share