Understanding MySQL Index Cardinality

By Ronald Bradford on September 23, 2015


When you have multiple different indexes that can be used for a query, MySQL tries to identify the most effective index for the query. It does so by analyzing statistics about the data distribution within each index. In our example, we are looking for all bands that were founded in 1980. Given these requirements, we create an index on the artist’s type because that is what we will be searching on.

mysql> ALTER TABLE artist ADD INDEX (type);

To demonstrate this correctly with all MySQL 5.x versions, we disable an optimizer setting for the purposes of this example:

mysql> SET @@session.optimizer_switch=‘index_merge_intersection=off’;
mysql> EXPLAIN SELECT artist_id, name, country_id
    -> FROM artist
    -> WHERE type='Band'
    -> AND founded = 1980\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: artist
         type: ref
possible_keys: founded,founded_2,type
          key: founded
      key_len: 2
          ref: const
         rows: 1216
        Extra: Using where

In this example, MySQL has to make a choice between the possible indexes as listed in possible_keys. The optimizer chooses an index based on the estimated cost to do the least amount of work, not what a human considers the right order. We can use the index cardinality to confirm the likely reason for this decision. Here is an example:

mysql> SHOW INDEXES FROM artist\G
...
*************************** 3. row ***************************
       Table: artist
  Non_unique: 1
    Key_name: founded
Seq_in_index: 1
 Column_name: founded
   Collation: A
Cardinality: 846
...
*************************** 5. row ***************************
       Table: artist
  Non_unique: 1
    Key_name: type
Seq_in_index: 1
 Column_name: type
   Collation: A
Cardinality: 10
...

This information shows that the founded column has a higher cardinality—that is, a higher number of unique values, and therefore there is a higher likelihood of finding the needed records in fewer reads from the index. The statistics information is only an estimate. We know from data analysis that there are only four unique values for type, yet statistics indicate otherwise.

A discussion of cardinality is not complete without discussing selectivity. Knowing the number of unique values in an index is not as useful as comparing that number to the total number of rows in the index. Selectivity is defined as the number of distinct values in relation to the number of records in the table. The ideal selectivity is a value of 1. This is a non null unique value for every value. Having an index with good selectivity means that fewer rows have the same value. Poor selectivity is when there are few distinct values—for example gender or a status. This determination can not only be used to determine when an index might not be effective, but also how to order columns in a multi column index when all columns are used for your queries.

The presented index cardinality provides a simple insight. The following two queries look for bands and combinations for the 1980s.

mysql> EXPLAIN SELECT artist_id, name, country_id
    -> FROM artist
    -> WHERE founded BETWEEN 1980 AND 1989 AND type='Band'\G
*************************** 1. row ***************************
...
possible_keys: founded,founded_2,type
          key: founded
      key_len: 2
          ref: NULL
         rows: 18690
        Extra: Using where

mysql> EXPLAIN SELECT artist_id, name, country_id
    -> FROM artist
    -> WHERE founded BETWEEN 1980 AND 1989 AND type='Combination'\G
*************************** 1. row ***************************
..
possible_keys: founded,founded_2,type
          key: type
      key_len: 1
          ref: const
         rows: 19414
        Extra: Using where

Although these queries appear similar, a different index path was chosen based on more detailed statistics of the distribution of information for the columns.

Related Posts

Leave a Reply