Optimizing MySQL Indexes

By Ronald Bradford on September 28, 2015

The management of indexes—how they are created and maintained—can impact the performance of SQL statements.

Combining Your DDL

An important management requirement when adding indexes to MySQL is the blocking nature of a DDL statement. Historically, the impact of an ALTER statement required that a new copy of the table be created. This could be a significant operation for time and disk volume when altering large tables. With the InnoDB plugin, first available in MySQL 5.1, and with other third party storage engines, various ALTER statements are now very fast, as they do not perform a full table copy. You should refer to the system documentation for the specific storage engine and MySQL version to confirm the full impact of your ALTER statement.

Combining multiple ALTER statements into one SQL statement is an easy optimization improvement. For example, if you needed to add a new index, modify an index, and add a new column, you could perform the following individual SQL commands:

ALTER TABLE test ADD INDEX (username);
ALTER TABLE test DROP INDEX name, ADD INDEX name (last_name, first_name);
ALTER TABLE test ADD COLUMN last_visit DATE NULL;

You can optimize this SQL by combining all statements for a single table into one SQL statement:

ALTER TABLE test
ADD INDEX (username),
DROP INDEX name,
ADD INDEX name (last_name, first_name),
ADD COLUMN last_visit DATE NULL;

This optimization can result in significant performance improvement of administration tasks.

Removing Duplicate Indexes

A duplicate index has two significant impacts: All DML statements will be slower as additional work is performed to maintain the data and index consistency. Additionally, the disk footprint of the database is now larger and can lead to increased backup and recovery time.

Several simple conditions can cause duplicate indexes. MySQL does not require a primary key column also to be indexed. Here is an example:

CREATE TABLE test(
  id INT UNSIGNED NOT NULL,
  first_name VARCHAR(30) NOT NULL,
  last_name  VARCHAR(30) NOT NULL,
  joined     DATE NOT NULL,
  PRIMARY KEY(id),
  INDEX (id)
);

In this DDL the defined index on id is a duplicate and should be removed.

A duplicate index also exists when the leftmost portion of a given index is contained within another index. Here is an example:

CREATE TABLE test(
  id INT UNSIGNED NOT NULL,
  first_name VARCHAR(30) NOT NULL,
  last_name  VARCHAR(30) NOT NULL,
  joined     DATE NOT NULL,
  PRIMARY KEY(id),
  INDEX name1 (last_name),
  INDEX name2 (last_name, first_name)
);

The name1 index is redundant as the index columns are contained within the leftmost portion of the name2 index.

The Maatkit mk-duplicate-index-checker command is one open source tool that can quickly review your database schema for duplicate indexes. Refer to http://maatkit.org/mk-duplicate-key-checker for more information.

Removing Unused Indexes

In addition to duplicate indexes that are unused, other defined indexes might be unused. These indexes have the same performance impact as duplicate indexes. The official MySQL product provides no means to identify what indexes are unused; however, several MySQL variants do provide this feature.

The Google MySQL patch (http://code.google.com/p/google-mysql-tools/wiki/Mysql5Patches) first introduced the SHOW INDEX_STATISTICS functionality. This feature is part of a number of new commands that measure per user monitoring.

For the official MySQL product, to determine unused indexes, you first need to collect all SQL statements executed. By using these SQL statements to capture and aggregate the Query Execution Plan (QEP) for all SQL statements, per table analysis will provide information about unused indexes. The process of SQL capture and QEP generation is a good practice for all applications.

Monitoring Ineffective Indexes

When defining multi column indexes, it is important that you determine the true effectiveness of all columns specified. This instrumentation is also not part of the official MySQL product.

Analysis of the key_len column for all SQL statements on a given table can identify any indexes that might contain unused columns.