SQL Server Merge Join

By: Steven Wort, Ross LoForte, Brian Knight on January 24, 2013

The merge join, exemplified in the following code, relies on sorted input and is an efficient algorithm if both inputs are available sorted.

With a one-to-many merge join, a merge join operator scans each input only once, which is why it is superior to other operators if the predicate is not selective. For sorted input, the optimizer can use a clustered index. If a non-clustered index covers the join and select columns, the optimizer can choose that option because it has fewer pages to fetch.

A many-to-many merge join is a little more complicated. A many-to-many merge join uses a temporary table to store rows. If duplicate values exist from each input, one of the inputs must rewind to the start of the duplicates as each duplicate from the other input is processed.

In this query, both tables have a clustered index on the SalesOrderID column, so the optimizer chooses a merge join. Sometimes the optimizer chooses the merge join, even if one of the inputs is not presorted by an index, by adding a sort to the plan. The optimizer would do that if the input were small. If the optimizer chooses to sort before the merge, check whether the input has many rows and is not presorted by an index. To prevent the sort, you must add the required indexes to avoid a costly operation.

SELECT oh.SalesOrderID, oh.OrderDate,od.ProductID 
FROM Sales.SalesOrderDetail od 
JOIN Sales.SalesOrderHeader oh 
ON oh.SalesOrderID = od.SalesOrderID

 Wiley Admin 13_28

Figure 1. Sample Execution Plan for Merge Join

Leave a Reply