The SQL Server query optimizer may not be able to produce a good plan for a query with a large number of joins. The same is true for complex queries with both joins and aggregations. However, because it is rarely necessary to request all the data in a single query, a good solution for those cases could be to just break down a large and complex query into two or more simpler queries while storing the intermediate results in temporary tables. Breaking down complex queries this way offers several advantages:
- Better plans – Query performance is improved because the query optimizer is able to create efficient plans for simpler queries.
- Better statistics – Because one of the problems of some complex plans is the degradation of intermediate statistics, breaking down these queries and storing the aggregated or intermediate results in temporary tables allows SQL Server to create new statistics, greatly improving the cardinality estimation of the remaining queries. It is worth noticing that temporary tables should be used and not table variables, as the latter does not have statistics support.
It should be noted that a new trace flag has been just introduced with SQL Server 2012 Service Pack 2 which provides better cardinality estimation while using table variables. This trace flag is also expected to be implemented in SQL Server 2014. For more details, see http://support.microsoft.com/kb/2952444.
- No hints required – Because using hints is a common practice to fix problems with complex plans, breaking down the query allows the query optimizer to create an efficient plan without requiring hints. This has the additional benefit that the query optimizer can automatically react to future data or schema changes. On the other hand, a query using hints would require future maintenance because the hint used may no longer be helpful, or may even impact its performance in a negative way after such changes.
Now let’s look at the following query patterns:
- OR logic in the WHERE clause
- Joins on aggregated data sets
- Queries with a large number of very complex joins
OR Logic in the WHERE Clause
The SQL Server query optimizer is able to identify and create efficient plans when using OR logic in the WHERE clause for the following cases:
- WHERE a.col1 = @val1 OR a.col1 = @val2, which is the same as WHERE a.col1 IN (@val1, @val2)
- WHERE a.col1 = @val1 OR a.col2 = @val2
- WHERE a.col1 = @val1 OR a.col2 IN (SELECT col2 FROM tab2)
As you’ll notice, all the listed cases use the same or different columns, but in the same table. Here is an example of a query showing the first case:
In this query, SQL Server is able to use index seek operations on two indexes, IX_SalesOrderHeader_CustomerID and IX_SalesOrderHeader_SalesPersonID, and use an index union to solve both predicates, showing the efficient plan in Figure 1.
Figure 1 Plan using WHERE a.col1 = @val1 OR a.col2 = @val2
However, poor plans may be created when filters on the OR operator evaluate different tables, basically following the pattern WHERE a.col1 = @val1 OR b.col2 = @val2. Running the following two queries using a very selective predicate will create two plans using efficient Index Seek operators on the IX_SalesOrderDetail_ProductID and IX_SalesOrderHeader_CustomerID indexes, returning two and three records, respectively:
However, if we join both tables using the same selective predicates (and just for demonstration purposes), SQL Server will now return a very expensive plan scanning the mentioned indexes, as shown in Figure 2, instead of using the more efficient seek operations shown previously:
Figure 2 Plan using WHERE a.col1 = @val1 OR b.col2 = @val2
This kind of query pattern can be fixed using the UNION clause instead of an OR condition, as shown next, which produces the same results and now allows for performing seeks on all indexes, resulting in a more efficient plan:
Although the query looks more complicated and redundant, it produces the very effective plan shown in Figure 3.
Figure 3 Plan using UNION
Joins on Aggregated Data Sets
The second query pattern happens when the results of aggregations are joined in large and complex queries. Statistics can provide a good cardinality estimation for operations performed on the table owning the statistics (for example, estimating the number of records returned by a filter operation). However, the query optimizer has to use this estimate in an operation earlier in the plan, apply it to the next operation to get a new estimation, and so on. In some complex plans, these estimates can quickly degrade inaccuracy.
By looking at any such complex query plan, you might notice that near the beginning of the data flow, the estimated and actual number of rows are very similar, but after the aggregated intermediate result sets are joined to other aggregated intermediate result sets, the quality of the cardinality estimation may have been degraded. Obviously, when the query optimizer is not able to get a good estimate of the size of the data set, it could make suboptimal decisions for joins, join orders, or other operations in the plan.
Solutions to these complex queries involve breaking down the queries as mentioned at the beginning of this section. You can partition your queries by looking at the plan to find places where there are large differences in the number of estimated versus the actual number of rows. In addition, each aggregation in the complex query could be saved into a temporary table, which again would create a better quality of statistics that can be used for the remaining parts of the query. Finally, a query with a large number of very complex joins is another query pattern that can benefit from being broken down into two or more simpler queries, with the intermediate results being stored in temporary tables.