The Pivot Operator Is More than Just a Report Option

on January 10, 2013


About the PIVOT Operator

The release of SQL Server 2005 gave us the PIVOT operator. I had always thought of this as a data presentation operator, but recently ran into an instance where it was necessary to analyze a stream of data and select various attributes of the data. The PIVOT operator made the code much easier to implement and understand. While I cannot show this specific code for that operation (company Confidential), let’s review the PIVOT operator syntax and do some benchmarking relative to the “old” way of pivoting prior to SQL Server 2005.

PIVOT Syntax

The syntax from BOL is:

SELECT <non-pivoted column>,

    [first pivoted column] AS <column name>,

    [second pivoted column] AS <column name>,

    ...

    [last pivoted column] AS <column name>

FROM

    (<SELECT query that produces the data>)

    AS <alias for the source query>

PIVOT

(

    <aggregation function>(<column being aggregated>)

FOR

[<column that contains the values that will become column headers>]

    IN ( [first pivoted column], [second pivoted column],

    ... [last pivoted column])

) AS <alias for the pivot table>

<optional ORDER BY clause>;

Normally I would not repeat the syntax from BOL, but I have found some “features” that are not necessarily spelled out in this BOL definition.

The first column above is labeled the <non-pivoted column>; however, I have found this not absolutely necessary (see benchmark example below (CASE 2)). But, it does make the query easier to read if the first column is the non-pivoted column.

Secondly, many examples (both BOL and other articles) show a direct SELECT statement being pivoted. I’ve found that in most cases, a sub-select is necessary (CASE 2 below) to organize the data in a form readily available for the PIVOT operator (example of this in benchmark below also) — especially if there is a WHERE clause in the query to produce the data.

So, with these caveats, let’s look at an example to illustrate these points as well as benchmark the results.

Benchmark Example

Consider the Adventeworks2012 database, and you have been given the problem of presenting the count of Products sold for the week of 7/22/2007 through 7/28/2007 (Sunday through Saturday) by Territory.

The query to generate this data is

selectst.NameTerritory,p.nameProduct, sod.SalesOrderDetailID

FROMSales.SalesOrderHeadersoh

JOINSales.SalesOrderDetailsod

ONsod.SalesOrderID=soh.SalesOrderID

JOINSales.SalesTerritoryst

onst.TerritoryID=soh.TerritoryID

JOINProduction.Productp

onp.ProductID=sod.ProductID

WHEREOrderDateBETWEEN'7/22/2007'and'7/28/2007'

This produces a result set of the form

Territory Product SalesOrderDetailID

Australia Road-250 Black, 44 38686

Australia Road Tire Tube 38687

Australia HL Road Tire 38688

France Mountain-200 Silver, 38 38689

France Mountain Bottle Cage 38690

France Water Bottle - 30 oz. 38691

…

There are 10 different Territories, but we only want to present the territories that had sales that particular week. The old way (prior to SS 2005) to pivot the data was to get the applicable Territory names first and use CASE statements to generate the count columns for each Territory. The PIVOT operator also requires the specification of the column names and will be shown below both manually and using dynamic SQL.

The following query was used to get the applicable territory names:

SELECTDISTINCTst.NameTerritory

FROM[Sales].[SalesOrderHeader]soh

JOIN[Sales].[SalesTerritory]st

onst.TerritoryID=soh.TerritoryID

WHERE[OrderDate]BETWEEN '7/22/2007' and '7/28/2007'

Yielding:

Northwest

Southwest

Canada

France

Germany

Australia

United Kingdom

So, before SS 2005, we used the CASE operator to generate the pivot as shown:

SELECTp.NameProduct,

SUM(CASEWHENst.Name='Northwest'THEN 1 ELSE 0 END)[Northwest],

SUM(CASEWHENst.Name='Southwest'THEN 1 ELSE 0 END)Southwest,

SUM(CASEWHENst.Name='Canada'THEN 1 ELSE 0 END)[Canada],

SUM(CASEWHENst.Name='France'THEN 1 ELSE 0 END)[France],

SUM(CASEWHENst.Name='Germany'THEN 1 ELSE 0 END)[Germany],

SUM(CASEWHENst.Name='Australia'THEN 1 ELSE 0 END)[Australia],

SUM(CASEWHENst.Name='United Kingdom'THEN 1 ELSE 0 END)[United Kingdom]

FROM[Sales].[SalesOrderHeader]soh

JOIN[Sales].[SalesOrderDetail]sod

ONsod.SalesOrderID=soh.SalesOrderID

JOIN[Sales].[SalesTerritory]st

onst.TerritoryID=soh.TerritoryID

JOIN[Production].[Product]p

onp.ProductID=sod.ProductID

WHERE[OrderDate]BETWEEN'7/22/2007'and'7/28/2007'

GROUPBYp.Name

The COUNT operator could have been used also, but the THEN parameter would have had to been sod.SalesOrderDetailID and the ELSE parameter NULL (both give same result). For purposes of labeling in the benchmark discussion below this will be CASE 1.

Now applying the PIVOT operator (CASE 2) to the same problem we get:

SELECT*FROM (

SELECT st.NameTerritory,p.nameProduct,SalesOrderDetailID

FROM[Sales].[SalesOrderHeader]soh

JOIN[Sales].[SalesOrderDetail]sod

ONsod.SalesOrderID=soh.SalesOrderID

JOIN[Sales].[SalesTerritory]st

ONst.TerritoryID=soh.TerritoryID

JOIN[Production].[Product]p

ONp.ProductID=sod.ProductID

WHERE[OrderDate]BETWEEN'7/22/2007'AND'7/28/2007')t

PIVOT (Count(SalesOrderDetailID)FORTerritoryIN ([Northwest],[Southwest],[Canada],

[France],[Germany],[Australia],[United Kingdom]))ASProductCount

And this query yields identical results to CASE 1.

Note that in both CASE 1 and Case 2 we had to first identify the columns for the PIVOT. If we want to dynamically identify the applicable columns we can do so as shown below(Case 3):

DECLARE@Territory VARCHAR(2000)='';

DECLARE@SQLNVARCHAR(4000);

SELECT@Territory=@Territory+COALESCE('['+Territory+'],','')

FROM (SELECTDISTINCTst.NameTerritory

FROM[Sales].[SalesOrderHeader]soh

JOIN[Sales].[SalesTerritory]st

ONst.TerritoryID=soh.TerritoryID

WHERE[OrderDate]BETWEEN'7/22/2007'and'7/28/2007')t

SET@Territory=SUBSTRING(@Territory,1,LEN(@Territory)-1)

SET@SQL=N'select * FROM (

select p.name Product,st.Name Territory,SalesOrderDetailID

FROM [Sales].[SalesOrderHeader] soh

JOIN [Sales].[SalesOrderDetail] sod

ON sod.SalesOrderID = soh.SalesOrderID

JOIN [Sales].[SalesTerritory] st

ON st.TerritoryID = soh.TerritoryID

JOIN [Production].[Product] p

ON p.ProductID = sod.ProductID

WHERE [OrderDate] BETWEEN ''7/22/2007'' AND ''7/28/2007'') t

PIVOT (Count(SalesOrderDetailID) FOR Territory IN ('+@Territory+N')) AS ProductCount';

EXECsp_executeSQL @SQL;

The only difference between CASE 2 and CASE 3 queries is that the PIVOT columns are derived “on the fly”. In any scenario the pivot columns would have to be identified with a similar query as shown in CASE 3. There are many ways to comma delimit a string, and the COALESCE function above is one method. Another method is to use XML schema, but that is a topic for another article.

The query cost was identical for all three cases (excluding the column string derivation for Case 3). Of course there was a cost for the PIVOT column derivation in CASEs 1 and 2, but that has to be derived one way or another in all 3 instances. The profiler traces are interesting; the numbers below were fairly consistent for several trials.

READS DURATION
CASE 1 1398 64
CASE 2 1398 66
CASE 3 1398 38

It is interesting that CASE 3 is almost 50% less in duration (without the query getting the column rows). If the query to get the column rows is added back into the duration, the duration is roughly same as CASE 1 and CASE 2. In all cases tested, CASE 1 was a couple microseconds less than CASE 2.

Conclusion

The PIVOT operator doesn’t seem to help performance, but does help code readability and most likely shortens code implementation time (less CASE statements to write up) and code maintenance. This is a nice operator to keep in your development bag of tricks.

Related Posts

Leave a Reply