MySQL can use indexes to support a range of different functions. Indexes are not just for optimizing MySQL performance when reading data. These functions include the following:
• Maintaining data integrity
• Optimizing data access
• Improving table joins
• Sorting results
• Aggregating data
MySQL uses both primary and unique keys to enforce a level of uniqueness of your storage data per table. The differences between primary and unique keys are as follows:
• Only one primary key may exist per table.
• A primary key cannot contain a NULL value.
• A primary key provides a means of retrieving any specific row in the table
• If an AUTO INCREMENT column is defined, it must be part of the primary key.
• More than one unique key per table is supported.
• A unique key can contain a NULL value where each NULL value is itself unique (that is, NULL != NULL).
The million_words table contains a primary key on the id column. This constraint ensures no duplicate values. Here is an example:
mysql> INSERT INTO million_words(id, word) VALUES(1, ‘xxxxxxxxx’); ERROR 1062 (23000): Duplicate entry ‘1’ for key ‘PRIMARY’
Likewise, the million_words table contains a unique key on the word column. This constraint ensures a duplicate word cannot be added.
mysql> INSERT INTO million_words(word) VALUES(‘oracle’); ERROR 1062 (23000): Duplicate entry ‘oracle’ for key ‘word’
In addition, some MySQL storage engines support foreign keys for data integrity. These are not actually an index; they are referred to as a constraint. However, a common prerequisite for certain implementations of foreign keys is that an index exists in both the source and parent tables to enable the management of foreign keys. Currently only the InnoDB storage engine of the default MySQL storage engines supports foreign key constraints and there is no requirement for a corresponding index; however, this is highly recommended for performance.
CAUTION Although MyISAM does not support foreign key constraints, the CREATE TABLE (…) ENGINE=MyISAM syntax allows for the definition of foreign keys via the REFERENCES syntax.
Optimizing Data Access
Indexes allow the optimizer to eliminate the need to examine all data from your table during query execution. By restricting the number of rows accessed, query speeds can be significantly improved. This is the most common use for an index.
For example, in our example table of one million words, if the word column is not indexed, each SELECT would need to scan all one million rows sequentially in the random order in which they were added to find zero or more matching rows every time. Even if the data were originally loaded in sequential order, SQL does not know this and must process every row to find a possible match.
For example, we will create a table without an index:
mysql> CREATE TABLE no_index_words LIKE million_words; mysql> ALTER TABLE no_index_words DROP INDEX word; mysql> INSERT INTO no_index_words SELECT * FROM million_words; mysql> SELECT * FROM no_index_words WHERE word='oracle';
1 row in set (0.25 sec)
When the table has an index on the word column, each SELECT would first scan the index that is ordered and is well optimized for searches to identify a reference to the zero or more rows that contain the matching information. When the index is defined as unique, the SELECT would know that the results contained at most one matching row. Here is another example, using our million_words table:
mysql> SELECT * FROM million_words WHERE word='oracle';
1 row in set (0.00 sec)
The indexed column example retrieves a row in less than 10 milliseconds via this MySQL client output. When not indexed, the row(s) retrieved take 250 milliseconds.
Adding an index is not an automatic improvement in performance for all types of SQL queries. Depending on the number of rows required, it might be more efficient to perform a full table scan of all data. This is a difference between random I/O operations of retrieving individual rows from index lookups and a sequential I/O operation to read all data.
Throughout the remainder of the book, we will be providing more detailed examples of how indexes are used for query restriction.
In addition to restricting data on a given table, the other primary purpose for an index is to join relational tables conveniently and efficiently. The use of an index on a join column provides the immediate performance benefit as described in the previous section when now matching a value in a different table. The mastering of creating correct indexes to perform efficient table joins is fundamental for SQL performance in all relational databases.
MySQL indexes store data in a sorted form. This makes the use of the index very applicable when you would like the result of a SELECT statement in a given order. It is possible to sort data for any SELECT query via the ORDER BY operator. Without an index on the ordered-by column, MySQL will typically perform an internal filesort of the retrieved table rows. The use of a predefined index can have a significant performance improvement on a high concurrency system that is required to sort hundreds or thousands of individual queries per second, since the results are naturally ordered in the index. Simply having an index that matches the order you want for your results does not automatically mean that MySQL will choose to use this index.
Indexes can be used as a means of calculating aggregated results more easily. For example, the sum of the total of all invoices for a given period might be more efficiently performed with an appropriate index on the date and invoice amount.