About the PIVOT Operator
The release of SQL Server 2005 gave us the PIVOT operator. I had always thought of this as a data presentation operator, but recently ran into an instance where it was necessary to analyze a stream of data and select various attributes of the data. The PIVOT operator made the code much easier to implement and understand. While I cannot show this specific code for that operation (company Confidential), let’s review the PIVOT operator syntax and do some benchmarking relative to the “old” way of pivoting prior to SQL Server 2005.
PIVOT Syntax
The syntax from BOL is:
SELECT <non-pivoted column>, [first pivoted column] AS <column name>, [second pivoted column] AS <column name>, ... [last pivoted column] AS <column name> FROM (<SELECT query that produces the data>) AS <alias for the source query> PIVOT ( <aggregation function>(<column being aggregated>) FOR [<column that contains the values that will become column headers>] IN ( [first pivoted column], [second pivoted column], ... [last pivoted column]) ) AS <alias for the pivot table> <optional ORDER BY clause>;
Normally I would not repeat the syntax from BOL, but I have found some “features” that are not necessarily spelled out in this BOL definition.
The first column above is labeled the <non-pivoted column>; however, I have found this not absolutely necessary (see benchmark example below (CASE 2)). But, it does make the query easier to read if the first column is the non-pivoted column.
Secondly, many examples (both BOL and other articles) show a direct SELECT statement being pivoted. I’ve found that in most cases, a sub-select is necessary (CASE 2 below) to organize the data in a form readily available for the PIVOT operator (example of this in benchmark below also) — especially if there is a WHERE clause in the query to produce the data.
So, with these caveats, let’s look at an example to illustrate these points as well as benchmark the results.
Benchmark Example
Consider the Adventeworks2012 database, and you have been given the problem of presenting the count of Products sold for the week of 7/22/2007 through 7/28/2007 (Sunday through Saturday) by Territory.
The query to generate this data is
selectst.NameTerritory,p.nameProduct, sod.SalesOrderDetailID FROMSales.SalesOrderHeadersoh JOINSales.SalesOrderDetailsod ONsod.SalesOrderID=soh.SalesOrderID JOINSales.SalesTerritoryst onst.TerritoryID=soh.TerritoryID JOINProduction.Productp onp.ProductID=sod.ProductID WHEREOrderDateBETWEEN'7/22/2007'and'7/28/2007' This produces a result set of the form Territory Product SalesOrderDetailID Australia Road-250 Black, 44 38686 Australia Road Tire Tube 38687 Australia HL Road Tire 38688 France Mountain-200 Silver, 38 38689 France Mountain Bottle Cage 38690 France Water Bottle - 30 oz. 38691 …
There are 10 different Territories, but we only want to present the territories that had sales that particular week. The old way (prior to SS 2005) to pivot the data was to get the applicable Territory names first and use CASE statements to generate the count columns for each Territory. The PIVOT operator also requires the specification of the column names and will be shown below both manually and using dynamic SQL.
The following query was used to get the applicable territory names:
SELECTDISTINCTst.NameTerritory FROM[Sales].[SalesOrderHeader]soh JOIN[Sales].[SalesTerritory]st onst.TerritoryID=soh.TerritoryID WHERE[OrderDate]BETWEEN '7/22/2007' and '7/28/2007' Yielding: Northwest Southwest Canada France Germany Australia United Kingdom So, before SS 2005, we used the CASE operator to generate the pivot as shown: SELECTp.NameProduct, SUM(CASEWHENst.Name='Northwest'THEN 1 ELSE 0 END)[Northwest], SUM(CASEWHENst.Name='Southwest'THEN 1 ELSE 0 END)Southwest, SUM(CASEWHENst.Name='Canada'THEN 1 ELSE 0 END)[Canada], SUM(CASEWHENst.Name='France'THEN 1 ELSE 0 END)[France], SUM(CASEWHENst.Name='Germany'THEN 1 ELSE 0 END)[Germany], SUM(CASEWHENst.Name='Australia'THEN 1 ELSE 0 END)[Australia], SUM(CASEWHENst.Name='United Kingdom'THEN 1 ELSE 0 END)[United Kingdom] FROM[Sales].[SalesOrderHeader]soh JOIN[Sales].[SalesOrderDetail]sod ONsod.SalesOrderID=soh.SalesOrderID JOIN[Sales].[SalesTerritory]st onst.TerritoryID=soh.TerritoryID JOIN[Production].[Product]p onp.ProductID=sod.ProductID WHERE[OrderDate]BETWEEN'7/22/2007'and'7/28/2007' GROUPBYp.Name
The COUNT operator could have been used also, but the THEN parameter would have had to been sod.SalesOrderDetailID and the ELSE parameter NULL (both give same result). For purposes of labeling in the benchmark discussion below this will be CASE 1.
Now applying the PIVOT operator (CASE 2) to the same problem we get:
SELECT*FROM ( SELECT st.NameTerritory,p.nameProduct,SalesOrderDetailID FROM[Sales].[SalesOrderHeader]soh JOIN[Sales].[SalesOrderDetail]sod ONsod.SalesOrderID=soh.SalesOrderID JOIN[Sales].[SalesTerritory]st ONst.TerritoryID=soh.TerritoryID JOIN[Production].[Product]p ONp.ProductID=sod.ProductID WHERE[OrderDate]BETWEEN'7/22/2007'AND'7/28/2007')t PIVOT (Count(SalesOrderDetailID)FORTerritoryIN ([Northwest],[Southwest],[Canada], [France],[Germany],[Australia],[United Kingdom]))ASProductCount
And this query yields identical results to CASE 1.
Note that in both CASE 1 and Case 2 we had to first identify the columns for the PIVOT. If we want to dynamically identify the applicable columns we can do so as shown below(Case 3):
DECLARE@Territory VARCHAR(2000)=''; DECLARE@SQLNVARCHAR(4000); SELECT@Territory=@Territory+COALESCE('['+Territory+'],','') FROM (SELECTDISTINCTst.NameTerritory FROM[Sales].[SalesOrderHeader]soh JOIN[Sales].[SalesTerritory]st ONst.TerritoryID=soh.TerritoryID WHERE[OrderDate]BETWEEN'7/22/2007'and'7/28/2007')t SET@Territory=SUBSTRING(@Territory,1,LEN(@Territory)-1) SET@SQL=N'select * FROM ( select p.name Product,st.Name Territory,SalesOrderDetailID FROM [Sales].[SalesOrderHeader] soh JOIN [Sales].[SalesOrderDetail] sod ON sod.SalesOrderID = soh.SalesOrderID JOIN [Sales].[SalesTerritory] st ON st.TerritoryID = soh.TerritoryID JOIN [Production].[Product] p ON p.ProductID = sod.ProductID WHERE [OrderDate] BETWEEN ''7/22/2007'' AND ''7/28/2007'') t PIVOT (Count(SalesOrderDetailID) FOR Territory IN ('+@Territory+N')) AS ProductCount'; EXECsp_executeSQL @SQL;
The only difference between CASE 2 and CASE 3 queries is that the PIVOT columns are derived “on the fly”. In any scenario the pivot columns would have to be identified with a similar query as shown in CASE 3. There are many ways to comma delimit a string, and the COALESCE function above is one method. Another method is to use XML schema, but that is a topic for another article.
The query cost was identical for all three cases (excluding the column string derivation for Case 3). Of course there was a cost for the PIVOT column derivation in CASEs 1 and 2, but that has to be derived one way or another in all 3 instances. The profiler traces are interesting; the numbers below were fairly consistent for several trials.
READS | DURATION | |
CASE 1 | 1398 | 64 |
CASE 2 | 1398 | 66 |
CASE 3 | 1398 | 38 |
It is interesting that CASE 3 is almost 50% less in duration (without the query getting the column rows). If the query to get the column rows is added back into the duration, the duration is roughly same as CASE 1 and CASE 2. In all cases tested, CASE 1 was a couple microseconds less than CASE 2.
Conclusion
The PIVOT operator doesn’t seem to help performance, but does help code readability and most likely shortens code implementation time (less CASE statements to write up) and code maintenance. This is a nice operator to keep in your development bag of tricks.
Leave a Reply