SQL Server Query Optimization–Understanding Statistics

By: Christian Bolton, Justin Langford, Glenn Berry, Gavin Payne, Rob Farley


Statistics provide critical information needed by SQL Server when performing query optimization. SQL Server statistics contain details about the data, and what the data looks like in each table within the database.
The query optimization process uses statistics to determine how many rows a query might need to access for a given query plan. It uses this information to develop its cost estimate for each step in the plan. If statistics are missing or invalid, the Query Optimizer can arrive at an incorrect cost for a step, and thus choose what ends up being a bad plan.
You can examine the statistics for any table in the database by using SQL Server Management Studio, expanding the Object Explorer to show the table you are interested in. For example, Figure 1 below shows the person.Address table in the AdventureWorks2012 database. Expand the table node, under which you will see a Statistics node. Expand this, and you will see a statistic listed for each index that has been created, and in many cases you will see additional statistics listed, often with cryptic names starting with _WA. These are statistics that SQL Server has created automatically for you, based upon queries that have been run against the database. SQL Server creates these statistics when the AUTO_CREATE_STATISTICS option is set to ON.
C 05f 002
Figure 1.
To see the actual statistic values, you can select an individual statistic, right-click it, and select the Properties option from the menu options. This will show you the Properties dialog for the statistic you selected. The first page, General, displays the columns in the statistic and when it was last updated. The Details page contains the real guts of the statistic, and shows the data distribution. For the PK_AddressAddressID statistic on the person.Address table in AdventureWorks2012, you should see something similar to Figure 2.
C 05f 003
Figure 2.
This figure shows just part of the multi-column output, which is the same output that you get when running the following DBCC command:
DBCC SHOW_STATISTICS (”Person.Address”, PK_Address_AddressID)
The following SQL Server configuration options control how statistics are created.

Auto_create_statistics

When this is on (default), SQL Server automatically creates statistics when it thinks they would result in a better plan. That usually means when it is optimizing a query that references a column without statistics.

Auto_update_statistics

When this is on (default), SQL Server automatically updates statistics when a sufficient amount of the data in the relevant columns has changed. By default, this is done synchronously, which means that a query has to wait for the statistics to be updated before the optimization process can be completed.

Auto_update_statistics_asynchronously

When this option is on, SQL Server updates statistics asynchronously. This means that when it’s trying to optimize a query and the statistics are outdated, it will continue optimizing the current query using the old stats, and queue the stats to be updated asynchronously. As a result, the current query doesn’t benefit from the new stats, but it does not have to wait while stats are being updated before getting a plan and running. Any future queries can then benefit from the new stats.

Leave a Reply