When you have multiple different indexes that can be used for a query, MySQL tries to identify the most effective index for the query. It does so by analyzing statistics about the data distribution within each index. In our example, we are looking for all bands that were founded in 1980. Given these requirements, we create an index on the artist’s type because that is what we will be searching on.
mysql> ALTER TABLE artist ADD INDEX (type);
To demonstrate this correctly with all MySQL 5.x versions, we disable an optimizer setting for the purposes of this example:
mysql> SET @@session.optimizer_switch=‘index_merge_intersection=off’; mysql> EXPLAIN SELECT artist_id, name, country_id -> FROM artist -> WHERE type='Band' -> AND founded = 1980\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: artist type: ref possible_keys: founded,founded_2,type key: founded key_len: 2 ref: const rows: 1216 Extra: Using where
In this example, MySQL has to make a choice between the possible indexes as listed in possible_keys. The optimizer chooses an index based on the estimated cost to do the least amount of work, not what a human considers the right order. We can use the index cardinality to confirm the likely reason for this decision. Here is an example:
mysql> SHOW INDEXES FROM artist\G ... *************************** 3. row *************************** Table: artist Non_unique: 1 Key_name: founded Seq_in_index: 1 Column_name: founded Collation: A Cardinality: 846 ... *************************** 5. row *************************** Table: artist Non_unique: 1 Key_name: type Seq_in_index: 1 Column_name: type Collation: A Cardinality: 10 ...
This information shows that the founded column has a higher cardinality—that is, a higher number of unique values, and therefore there is a higher likelihood of finding the needed records in fewer reads from the index. The statistics information is only an estimate. We know from data analysis that there are only four unique values for type, yet statistics indicate otherwise.
A discussion of cardinality is not complete without discussing selectivity. Knowing the number of unique values in an index is not as useful as comparing that number to the total number of rows in the index. Selectivity is defined as the number of distinct values in relation to the number of records in the table. The ideal selectivity is a value of 1. This is a non null unique value for every value. Having an index with good selectivity means that fewer rows have the same value. Poor selectivity is when there are few distinct values—for example gender or a status. This determination can not only be used to determine when an index might not be effective, but also how to order columns in a multi column index when all columns are used for your queries.
The presented index cardinality provides a simple insight. The following two queries look for bands and combinations for the 1980s.
mysql> EXPLAIN SELECT artist_id, name, country_id -> FROM artist -> WHERE founded BETWEEN 1980 AND 1989 AND type='Band'\G *************************** 1. row *************************** ... possible_keys: founded,founded_2,type key: founded key_len: 2 ref: NULL rows: 18690 Extra: Using where mysql> EXPLAIN SELECT artist_id, name, country_id -> FROM artist -> WHERE founded BETWEEN 1980 AND 1989 AND type='Combination'\G *************************** 1. row *************************** .. possible_keys: founded,founded_2,type key: type key_len: 1 ref: const rows: 19414 Extra: Using where
Although these queries appear similar, a different index path was chosen based on more detailed statistics of the distribution of information for the columns.
Leave a Reply