What You Should Know About MySQL Multi Column Indexes

By Ronald Bradford on September 23, 2015


It is possible for an index to have two or more columns. Multi column indexes are also known as compound or concatenated indexes.

Let us look at a query that could use two different indexes on the table based on the WHERE clause restrictions. We first create these indexes.

mysql> ALTER TABLE album
    -> ADD INDEX (country_id),
    -> ADD INDEX (album_type_id);
Query OK, 553875 rows affected (18.89 sec)

It is more efficient to combine DML statements for a given table when possible. If you chose to run these ALTER statements as two individual statements, the following would occur:

mysql> ALTER TABLE album DROP index country_id, drop index album_type_id;
Query OK, 553875 rows affected (15.72 sec)
mysql> ALTER TABLE album ADD INDEX (country_id);
Query OK, 553875 rows affected (16.76 sec)
mysql> ALTER TABLE album ADD INDEX (album_type_id);
Query OK, 553875 rows affected (25.23 sec)

If this was a production size table, and an ALTER statement took 60 minutes or 6 hours, this is a significant saving.

TIP

Creating an index is a time intensive operation and can block other operations. You can combine creating multiple indexes on a given table with a single ALTER statement.

mysql> EXPLAIN SELECT al.name, al.first_released, al.album_type_id
    -> FROM album al
    -> WHERE al.country_id=221
    -> AND album_type_id=1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: al
         type: ref
possible_keys: album_type_id,country_id
          key: country_id
      key_len: 3
          ref: const
         rows: 154638
        Extra: Using where

NOTE

Depending on which version of MySQL you use, optimizer improvements can provide a different QEP in this example. You can generally guarantee the same output of the following examples in all MySQL 5.x versions with the following MySQL backward compatibility system variable setting:

mysql> SET @@session.optimizer_switch=‘index_merge_intersection=off’;

However, if we run the same query for a different value for the album type, we now use a different index:

mysql> EXPLAIN SELECT al.name, al.first_released, al.album_type_id
    -> FROM album al
    -> WHERE al.country_id=221
    -> AND album_type_id=4\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: al
         type: ref
possible_keys: album_type_id,country_id
          key: album_type_id
      key_len: 4
          ref: const
         rows: 58044
        Extra: Using where

Why did MySQL make this decision? From the EXPLAIN rows column we draw a conclusion that the cost based optimizer determines a significantly less cost—that is, 58K rows to read compared with 154K rows to read.

mysql> SHOW INDEXES FROM album\G
...
*************************** 4. row ***************************
        Table: album
   Non_unique: 1
     Key_name: album_type_id
 Seq_in_index: 1
  Column_name: album_type_id
    Collation: A
  Cardinality: 12
...
*************************** 6. row ***************************
        Table: album
   Non_unique: 1
     Key_name: country_id
 Seq_in_index: 1
  Column_name: country_id
    Collation: A
  Cardinality: 499
...

If MySQL only used the index cardinality, then you would expect the QEP always to use the country_id column as this is generally more unique and would therefore retrieve less rows. Although the index cardinality is an indicator of uniqueness, MySQL also holds additional statistics regarding the range and volume of unique values. We can confirm some of these numbers by looking at the actual table distribution.

mysql> SELECT COUNT(*) FROM album where country_id=221;

f0060-03

 

 

f0061-01

 

 

 

 

 

 

In the first query, the country_id index was selected. The actual results show 92K rows compared with 289K rows if the album_type_id was selected.

For the second query, the actual results show 92K rows compared with 111K rows, with the later actually being selected. If you compare the actual numbers with the QEP estimated rows values, you also find a reasonable discrepancy—for example the second query estimates 58K rows when there are actually 111K or almost two times more actual rows.

Related Posts

Leave a Reply