Although adding indexes can help to optimize the performance of SQL statements, the addition of indexes comes at a significant cost.
Adding indexes to a table affects the performance of writes. This can be easily shown from the artist table used in this chapter. Looking at the current definition shows a large number of indexes:
mysql> SHOW CREATE TABLE album\G *************************** 1. row *************************** Table: album Create Table: CREATE TABLE `album` ( `album_id` int(10) unsigned NOT NULL, `artist_id` int(10) unsigned NOT NULL, `album_type_id` int(10) unsigned NOT NULL, `name` varchar(255) NOT NULL, `first_released` year(4) NOT NULL, `country_id` smallint(5) unsigned DEFAULT NULL, PRIMARY KEY (`album_id`), KEY `artist_id` (`artist_id`), KEY `country_id` (`country_id`), KEY `album_type_id` (`album_type_id`), KEY `m1` (`country_id`,`album_type_id`), KEY `m2` (`album_type_id`,`country_id`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1
By running a simple benchmark we can test the insert rate of the current album table with the original definition that included fewer indexes:
DROP TABLE IF EXISTS t1; CREATE TABLE t1 LIKE album; INSERT INTO t1 SELECT * FROM album; DROP TABLE t1; CREATE TABLE t1 LIKE album; -- NOTE: Due to indexes created during this chapter, this may fail. -- Adjust dropped indexes appropriately ALTER TABLE t1 DROP INDEX first_released, DROP INDEX album_type_id, DROP INDEX name, DROP INDEX country_id, DROP INDEX m1, DROP INDEX m2; INSERT INTO t1 SELECT * FROM album; DROP TABLE t1;
Here are the timed results:
# Insert with indexes Query OK, 553875 rows affected (24.77 sec) # Insert without indexes Query OK, 553875 rows affected (7.14 sec)
Inserting data into the table with additional indexes was four times slower. This is a simple bulk test and other factors can contribute to the slower speed; however, this provides a representative example that adding indexes to a table has a direct effect on write performance.
One of the easiest techniques for index optimization is to remove duplicate indexes. Although it is easy to spot an index that is identical, other common occurrences are an index matching the primary key or indexes that are subsets of other indexes. Any index that is contained within the leftmost portion of another index is a duplicate index that will not be used. Here is an example:
CREATE TABLE `album` ( ... PRIMARY KEY (`album_id`), KEY `artist_id` (`artist_id`), KEY `country_id` (`country_id`), KEY `m1` (`album_type_id`,`country_id`), KEY `m2` (`country_id`,`album_type_id`) ...
The country_id index is actually a duplicate due to the m2 index.
The Maatkit mk-duplicate-key-checker is an open source tool that can be used to identify duplicate indexes. The human verification of a desk check of your schema tables also works.
One of the deficiencies of MySQL instrumentation is the lack of determining the usage of an index. With the analysis of all your SQL statements, you could deduce what indexes are not used. It is important to determine what indexes are used and what indexes are not used. Indexes have a performance impact for writes and have a disk space impact that can affect your backup and recovery strategy. Indexes that are less effective can use valuable memory resources.
First released by Google in 2008, the SHOW INDEX_STATISTICS command enabled you to obtain this information in a more precise method. Various MySQL forks and variances now include this feature, but the official MySQL product does not.
For more information see http://code.google.com/p/google-mysql-tools/wiki/UserTableMonitoring.
Regardless of the tools used to determine whether an index is used or not, it is important that you analyze the effectiveness of columns defined in an index also to find portions of indexes that are ineffective.
As the size of your tables grows, the impact of performance is generally affected. For example, the addition of indexes on the primary table took on average 20–30 seconds.
mysql> ALTER TABLE album ADD INDEX m1 (album_type_id, country_id); Query OK, 553875 rows affected (21.05 sec)
Traditionally the cost of any ALTER statement was a blocking statement as a new version of a table was created. It was possible to SELECT data, but any DML operation would then cause blocking of all statements due to standard escalation policies. When your table size is 1G or 100G, this blocking time will be significantly longer. In more recent versions there have been a number of advances, both with the MySQL product and with creative solutions.
There are some exceptions to the impact of adding indexes. With InnoDB, fast index creation features are available with the InnoDB plugin in MySQL 5.1 and by default in MySQL 5.5 or better. More information is at http://dev.mysql.com/doc/innodb/1.1/en/innodb-create-index.html.
Other storage engines also implement different ways of creating fast indexes that perform little to no locking. Tokutek is one such engine. Read more at http://tokutek.com/2011/03/hot-column-addition-and-deletion-part-i-performance/.
The impact of disk space is also an important consideration, especially if you are using the default common tablespace configuration for InnoDB. MySQL creates a copy of your table. If your table is 200GB in size, then you need at least 200GB more disk space to perform an ALTER TABLE. Using InnoDB, this additional disk space is added to the common tablespace during the operation. This is not reclaimed on the filesystem at the completion of the command. This additional space is reused internally when InnoDB requires additional disk space. Although you can switch to a per table tablespace, this has an impact on write intensive systems.
TIP There are also various techniques to minimize this blocking operation. You can elect to use a high availability master/fail-over master replication topology to support online alters. More recently Shlomi Noach introduced the oak-online-alter-table utility. See information at http://code.openark.org/blog/mysql/online-alter-table-now-available-in-openark-kit. Facebook also released its online schema change (OSC) tool that performs in a similar fashion. More information is at http://www.facebook.com/notes/mysql-at-facebook/online-schema-change-for-mysql/430801045932.
Disk Space Impact
Using the INFORMATION_SCHEMA.TABLES query from Chapter 2 we can obtain the size of the album table used in this chapter.
Before adding indexes:
*************************** 1. row *************************** table_name: album engine: InnoDB format: Compact table_rows: 539927 avg_row: 65 total_mb: 47.08 data_mb: 33.56 index_mb: 13.52
After adding indexes:
*************************** 1. row *************************** table_name: album engine: InnoDB format: Compact table_rows: 548882 avg_row: 64 total_mb: 129.27 data_mb: 33.56 index_mb: 95.70
After adding indexes:
You can see a 7 times increase in the amount of index space used for this table. Depending on your backup and recovery procedures, this is a direct impact on the increased time for both processes. Adding indexes will have a direct impact in other areas. What is important is that you understand and consider the impact before adding indexes.
The use of InnoDB can also have a direct effect on the size of disk space with the choice of primary key and how that primary key is used. Secondary indexes always have the primary key appended to each secondary index record. Therefore, it is important to use as small a primary key data type as possible for InnoDB tables.
There is an exception when a greater disk footprint can be of longer term performance benefit. In cases of extreme table size (such as hundreds of gigabytes), an ordered primary key that is not a sequential key might produce more sequential disk activity if all queries use the primary key order. Although the fill factor causes a greater data size, the overall time of a highly concurrent system that retrieves large numbers of rows by the primary key order can result in more even disk performance and overall query performance. This is very rare example that highlights that detailed monitoring and suitable production volume testing is necessary to look at long term benefits in overall performance.
Page Fill Factor
Your choice of a natural primary key over a surrogate primary key has a direct influence of your default page fill factor. For a surrogate primary key, InnoDB will fill data pages when inserting new data to a 15/16th volume as the order is naturally increasing. When the primary key is a natural key, InnoDB tries to minimize reorganization of the data with page splitting when inserting new data. Generally InnoDB will fill data pages only to 50 percent initially. This results in a naturally larger disk footprint, and when the data volume exceeds the allocated memory to the InnoDB Buffer Pool, packing more data into 16K data pages might provide performance improvements. Chapter 3 provides an example of the disk size that occurs due to the sequential and natural fill factors.
The internal implementation of the B-tree secondary index in InnoDB has a significant difference from a MyISAM B-tree secondary index. InnoDB uses the primary key value within a secondary index, not a pointer to the primary key. A copy of the applicable primary key is appended to each index record. When your database table has a primary key length of 40 bytes, and you have 15 indexes, the index size can be dramatically reduced by introducing a shorter primary key. This primary key value implementation has performance benefits combined with the use of internal InnoDB primary key hash.