Simplifying SQL Statements to Improve MySQL Performance

By: Ronald Bradford


When it is not possible to eliminate an SQL statement to improve performance, it might be possible to simplify the statement. Consider the following questions:

  • Are all columns required?
  • Can a table join be removed?
  • Is a join or WHERE restriction necessary for additional SQL statements in a given function?

Column Improvement

An important requirement of simplification is to capture all SQL statements in order for a given executed function. Using a sampling process will not identify all possible improvements. Here is an example of a query simplification:

mysql> SELECT fid, val, val
    -> FROM table1
    -> WHERE fid = X;

This query returned 350,000 rows of data that was cached by the application server during system startup. For this query, two-thirds of the result set that was passed over the network was redundant. The fid column was passed as a restriction to the SQL statement, and therefore the value was identical for all rows. The val column was also unnecessarily duplicated. The optimal SQL statement is shown here:

mysql> SELECT val
    -> FROM table1
    -> WHERE fid = X;

Join Improvement

The following is an example of a table join simplification that requires knowledge of all SQL statements for a given function. During the process, the following SQL statement is executed to retrieve a number of rows:

mysql> SELECT /* Query 1 */ id FROM table1
    -> WHERE col1 = X
    -> AND col2 = Y;

At a later time in the execution of this function, the following statement was executed using the id value from the previous SQL statement:

mysql> SELECT /* Query 2 */ table2.val1, table2.val2,
table2.val3
    -> FROM table2 INNER JOIN table1 USING (id)
    -> WHERE table2.id = 9
    -> AND   table1.col1 = X
    -> AND   table1.col2 = Y
    -> AND   table2.col1 = Z;

This second SQL statement could be simplified to this:

mysql> SELECT /* Query 2 */ val1, val2, val3
    -> FROM table2
    -> WHERE table2.id = 9
    -> AND   table2.col1 = Z

As the first query performs the necessary restriction for the id column (that is, col1 = X and col2 = Y), the join and restriction clauses for table1 are redundant. Removing this join condition simplifies the SQL statement and removes a potential problem if only one statement is altered in the future.

Rewriting Subqueries

The MySQL database supports subqueries. The performance of subqueries may be significantly slower in certain circumstances than using a normal table join. Here is an example:

SELECT id, label
FROM   code_opts
WHERE  code_id = (SELECT id FROM codes WHERE typ='CATEGORIES')
ORDER BY seq

This SQL statement can simply be rewritten as follows:

SELECT o.id, o.label
FROM   code_opts o INNER JOIN codes c ON o.code_id = c.id
WHERE  c.typ='CATEGORIES'
ORDER BY o.seq

The change might appear to be subtle; however, this approach for more complex queries can result in improved query performance.

Understanding the Impact of Views

Developers should know the true class of the table that is used for SQL statements. If the object is actually a view, the impact of SQL optimizations can be masked by the complexity of join conditions for the view definition. A common problem with data warehouse (DW) and business intelligence (BI) tools is the creation of views on top of views. For any view definition, additional information may be retrieved that is not ultimately required for an SQL statement. In MySQL, complex queries involving views can be easily improved by using the necessary underlying tables.

Leave a Reply