About SQL Server 2014’s New Cardinality Estimator

By: SolarWinds

SQL Server 2014 includes a new cardinality estimator, and in this version, the old cardinality estimator is still available. This section explains what a cardinality estimator is, why a new cardinality estimator was built, and how to enable the new and the old cardinality estimators.

The cardinality estimator is the component of the query processor whose job it is to estimate the number of rows returned by relational operations in a query. This information, along with some other data, is used by the query optimizer to select an efficient execution plan. Cardinality estimation is inherently inexact because it is a mathematical model that relies on statistical information. It is also based on several assumptions that, although not documented, have been known over the years—some of them include the uniformity, independence, containment, and inclusion assumptions. A brief description of these assumptions follows:

  • Uniformity Used when the distribution for an attribute is unknown—for example, inside of range rows in a histogram step or when a histogram is not available.
  • Independence Used when the attributes in a relation are independent, unless a correlation between them is known.
  • Containment Used when two attributes might be the same; in this case, they are assumed to be the same.
  • Inclusion Used when comparing an attribute with a constant; it is assumed there is always a match.

The current cardinality estimator was written along with the entire query processor for SQL Server 7.0, which was released back in December 1998. Obviously this component has faced multiple changes during several years and multiple releases of SQL Server, including fixes, adjustments, and extensions to accommodate cardinality estimation for new T-SQL features. So you may be thinking, why replace a component that has been successfully used for about the last 15 years?

In the paper “Testing Cardinality Estimation Models in SQL Server” by Campbell Fraser et al., the authors explain some of the reasons for the redesign of the cardinality estimator, including the following:

  • To accommodate the cardinality estimator to new workload patterns.
  • Changes made to the cardinality estimator over the years made the component difficult to “debug, predict, and understand.”
  • Trying to improve on the current model was difficult using the current architecture, so a new design was created, focused on the separation of tasks of (a) deciding how to compute a particular estimate, and (b) actually performing the computation.

I was also surprised to read in the paper that the authors admit that, according to their experience in practice, the previously listed assumptions are “frequently incorrect.”

A major concern that comes to mind with such a huge change inside the query optimizer is plan regressions. The fear of plan regressions has been considered the biggest obstacle to query optimizer improvements. Regressions are problems introduced after a fix has been applied to the query optimizer and are sometimes referred to as the classic “two wrongs make a right.” This can happen when two bad estimations—for example, one overestimating a value and the second one underestimating it—cancel each other out, luckily giving a good estimate. Correcting only one of these values may now lead to a bad estimation, which may negatively impact the choice of plan selection, thus causing a regression.

To help avoid regressions related to the new cardinality estimator, SQL Server provides a way to enable or disable it, depending on the database compatibility level. This can be changed using the ALTER DATABASE statement, as indicated earlier. Setting a database to the compatibility level 120 will use the new cardinality estimator, whereas a compatibility level less than 120 will use the old cardinality estimator. In addition, once you are using a specific cardinality estimator, there are two trace flags you can use to change to the other. Trace flag 2312 can be used to enable the new cardinality estimator, whereas trace flag 9481 can be used to disable it. You can even use the trace flags for a specific query using the QUERYTRACEON hint. Both trace flags and their use with the QUERYTRACEON hint are documented and supported.

Finally, SQL Server includes several new extended events we can use to troubleshoot problems with cardinality estimation, or just to explore how it works. These events include query_optimizer_estimate_cardinality, inaccurate_cardinality_estimate, query_optimizer_force_both_cardinality_estimation_behaviors, and query_rpc_set_cardinality.

Leave a Reply