The forgotten MERGE statement: is it too underutilized?

By Mike Byrd on December 5, 2012


With the release of SQL Server 2008, we finally now have a single statement that allows an INSERT, UPDATE, and even DELETE within one TSQL statement. However, it seems to be rarely used. Let’s look at it once again and see if there are any cons to using it.

MERGE Syntax

The MERGE syntax is discussed in BOL and many other articles, but let’s look at how it was used in this test scenario. See Script 3 at the bottom of this article. Note that I used source and target as the aliases for the query. This helps me keep track of the structure of the query.

The source can be a table or a query (in this case a query). The ON clause is what “joins” the source and target and is actually the definition of the MATCH condition. When “Matched” and other conditions may apply (CUD in this specific instance) you normally perform an update. When “NOT MATCHED” (By TARGET is default), you normally would specify an INSERT statement as shown. The “NOT MATCHED BY SOURCE” is the normal condition when you want to do a hard delete on the SOURCE table.

One of the nice attributes of the MERGE statement is the $action parameter when combined with the inserted and deleted actions allows easy tracking of what rows were actually DELETED, INSERTED, or UPDATED.

Test Setup

The Sales.SalesOrderDetail table in the AdventureWorks2012 OLAP database was selected for its universal availability and for its size. To reduce the complexity of the test, all Foreign Keys, non-clustered indexes and the applicable trigger were removed to provide a relatively simple query plan.

Script 1 at the bottom of this paper shows the details for the setup. Since the table has roughly 120,000 rows I decided to look at the middle 40,000 rows dividing out 16,000 for update, 16,000 for insert and 8,000 for the delete. Script 2 shows the applicable update, insert, and delete statements used to modify the table. Script 3 shows the same logic only using the MERGE statement to achieve the same results.

Test Conduct

Before each test run, the database was restored back to original condition and Script 1 run to initialize the data needed for the test. Profiler was used to capture Reads, Writes, CPU, RowCounts, and Duration for each test. Scripts 2 and 3 were run (after DB restore and Script 1 execution) to gain comparison data. Tests were repeated several times and the values were consistent each time.

Test Results

The test results were not as expected. Many earlier articles on the MERGE statement have touted its reduced IO performance, but this was not the result as shown below:

Reads Writes CPU Row Count Duration
UPDATE 51915 24 109 16175 118
INSERT 55798 299 109 16176 263
DELETE 27995 80 62 8089 102
135708 403 280 40440 483
MERGE 181367 442 1484 40440 2471

 

For all four performance measures, the combined individual UPDATE, INSERT, and DELETE statements outperformed the MERGE statement. I studied the individual query plans for each test case and saw no surprises – in fact, the query plans for the individual statements called for suggested indexes for all three statements while the MERGE query plan didn’t specify any suggested indexes.

I’m not sure what to make of these results. While logically it would seem that the MERGE statement would have less IO; for this specific instance (I think this test case is a common scenario), the MERGE statement performed worse than the 3 individual statements.

I commented out the OUTPUT part of the MERGE statement and that resulted in minimal change in the performance data. I think the only logical conclusion is that the query optimizer is more refined for the three individual operations than for the more recent MERGE operation.

While I like the concept of the MERGE statement I have to conclude to be very careful in its application. Every scenario is different. As always, test, test, and test again!

Script 1

/*
	Script to setup tables before running merge vs update/insert/delete comparison
	Restore AdventureWorks2012 back to original state before running this script
*/
USE AdventureWorks2012
GO
IF OBJECT_ID (N'tempdb..#Temp') IS NOT NULL DROP TABLE #Temp
GO

--get data for comparison (middle of adventureworks, 40,000+ rows; 16175 to update, 16176 to insert, 8089 to delete
select *, '' CUD, Row_Number() OVER(ORDER BY SalesOrderDetailID) RowNumber
	INTO #Temp
  FROM Sales.SalesOrderDetail
  WHERE SalesOrderDetailID BETWEEN 40439 AND 80878

--delete last 20% rows in temp to force delete in Merge
DELETE #Temp
	WHERE SalesOrderDetailID BETWEEN 72790 AND 80878

--alternate rows for insert/update
UPDATE #Temp
	SET CUD = CASE WHEN RowNumber % 2 = 1 THEN 'I' ELSE 'U' END
 GO
--Drop existing nonclustered indexes, Foreign Keys, and trigger to simplify example
DROP INDEX [AK_SalesOrderDetail_rowguid] ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )
DROP INDEX [IX_SalesOrderDetail_ProductID] ON [Sales].[SalesOrderDetail] WITH ( ONLINE = OFF )
ALTER TABLE [Sales].[SalesOrderDetail] DROP CONSTRAINT [FK_SalesOrderDetail_SalesOrderHeader_SalesOrderID]
ALTER TABLE [Sales].[SalesOrderDetail] DROP CONSTRAINT [FK_SalesOrderDetail_SpecialOfferProduct_SpecialOfferIDProductID]
DROP TRIGGER [Sales].[iduSalesOrderDetail]
GO

--DELETE rows out of Sales.SalesOrderDetail needed for insert
DELETE sod
	FROM Sales.SalesOrderDetail sod
	JOIN #Temp t
	  ON t.SalesOrderDetailID = sod.SalesOrderDetailID
	WHERE t.CUD = 'I'
GO


--modify data for update/insert
UPDATE #Temp
	SET UnitPrice = UnitPrice * 1.05,
		ModifiedDate = GETDATE()
GO

Script 2

/*
	Script to test Update/Insert/Delete TSQL statements for comparison against Merge statement
*/
SET STATISTICS IO ON
GO

UPDATE sod
	SET UnitPrice = t.UnitPrice,
		ModifiedDate = GETDATE()
	FROM Sales.SalesOrderDetail sod
	JOIN #Temp t
	  on t.SalesOrderDetailID = sod.SalesOrderDetailID
	WHERE t.CUD = 'U'

SET IDENTITY_INSERT Sales.SalesOrderDetail ON
GO
INSERT Sales.SalesOrderDetail
	(SalesOrderID,SalesOrderDetailID,CarrierTrackingNumber,OrderQty,ProductID,SpecialOfferID,UnitPrice,UnitPriceDiscount,rowguid,ModifiedDate)
	SELECT SalesOrderID,SalesOrderDetailID,CarrierTrackingNumber,OrderQty,ProductID,SpecialOfferID,UnitPrice,UnitPriceDiscount,rowguid,ModifiedDate
		FROM #Temp t
		WHERE t.CUD = 'I'
GO
SET IDENTITY_INSERT Sales.SalesOrderDetail OFF
GO

DELETE sod
	FROM Sales.SalesOrderDetail sod
	WHERE NOT EXISTS (SELECT 1 FROM #Temp t WHERE t.SalesOrderDetailID = sod.SalesOrderDetailID)
	  AND sod.SalesOrderDetailID BETWEEN 40439 AND 80878
GO
SET STATISTICS IO OFF
GO

Script 3

/*
	Script to test Update/Insert/Delete TSQL statements for comparison against Merge statement
*/

SET STATISTICS IO ON;
SET IDENTITY_INSERT Sales.SalesOrderDetail ON;
MERGE Sales.SalesOrderDetail as target
	USING (SELECT SalesOrderID,SalesOrderDetailID,CarrierTrackingNumber,
					OrderQty,ProductID,SpecialOfferID,UnitPrice,
					UnitPriceDiscount,rowguid,ModifiedDate,CUD
				FROM #Temp) as source
	ON source.SalesOrderDetailID = target.SalesOrderDetailID
	WHEN MATCHED AND source.CUD = 'U' THEN
		UPDATE
			SET target.UnitPrice = source.UnitPrice,
				target.ModifiedDate = GETDATE()
	WHEN NOT MATCHED THEN
		INSERT 
			(SalesOrderID,SalesOrderDetailID,CarrierTrackingNumber,OrderQty,ProductID,SpecialOfferID,UnitPrice,UnitPriceDiscount,rowguid,ModifiedDate)
		VALUES (SalesOrderID,SalesOrderDetailID,CarrierTrackingNumber,OrderQty,ProductID,SpecialOfferID,UnitPrice,UnitPriceDiscount,rowguid,ModifiedDate)
	WHEN NOT MATCHED BY SOURCE AND target.SalesOrderDetailID BETWEEN 40439 AND 80878 THEN
		DELETE
	OUTPUT $action, 
	DELETED.SalesOrderDetailID AS TargetSalesOrderDetailID, 
	INSERTED.SalesOrderDetailID AS SourceSalesOrderDetailID
	INTO #Output; 
	SELECT @@ROWCOUNT;
	SELECT [Action], Count(*)
		FROM #Output
		GROUP By [Action]

GO
SET IDENTITY_INSERT Sales.SalesOrderDetail OFF
GO
SET STATISTICS IO OFF
GO

Related Posts

Leave a Reply