Efficiently Calculate Running Sums in SQL Server 2012

on November 8, 2013


Generating a Running or Cumulative Sum Report, usually in a Year-To-Date (YTD), Quarterly, or Monthly format, is a fairly common requirement in most reporting systems or ad-hoc report requests. There are several ways of Calculating a Running Sum in SQL Server, depending on the version of SQL Server. In this article, we will explore the various options available in SQL Server 2012 to help us Calculate Running Sums, and evaluate the performance metrics of each.

While T-SQL is best at SET based operations, cursors are available in the language. I have come across cursor based solution for calculating running sums, as well as several versions of SET based solutions. SQL Server 2012 added several enhancements to windowing functions, including a more complete implementation of the OVER Clause, which helps us efficiently calculate Running Sums.

Code Listing 1: Create Schema objects and Data for Demonstration

CREATE TABLE demo_order
( cust_id BIGINT NOT NULL,
	order_nbr BIGINT NOT NULL,
	order_date	DATETIME NOT NULL,
	order_amount DECIMAL(18,4) NOT NULL) ;

ALTER TABLE demo_order ADD CONSTRAINT pk_demo_order PRIMARY KEY CLUSTERED (order_nbr) ;

INSERT INTO demo_order (cust_id,order_nbr,order_date,order_amount)
VALUES (21,1,'2013-11-01',22.2),
		(21,3,'2013-10-11',22.8),
		(21,7,'2013-09-01',23.2),
		(22,11,'2013-11-02',42.3),
		(22,6,'2013-08-01',252.2),
		(22,12,'2013-08-21',52.2),
		(31,15,'2013-06-01',62.3),
		(31,31,'2013-11-03',72.2),
		(31,61,'2013-08-01',21.2),
		(521,71,'2013-06-01',26.2),
		(521,77,'2013-10-18',29.2),
		(521,45,'2013-09-01',1.2) ;
GO

Our Goal is to generate a report that calculates the running sum of each Customer’s order amount, calculated and displayed in sequence of the Customer’s order date. The end result should look like the table below.

Table 1. The final result of Running Sum Calculation on Orders Data

Table 1 Running Totals

While Loop Based Row-By-Row Solution

The Procedural way of going about this task would be using a cursor or loop based (row-by-row) solution. This code will work in any version of SQL Server starting 2005 or later. I am using the ROW_NUMBER OVER function to generate sequence a number for every unique combination for customer and order IDs.

Code Listing 2: Cursor/Loop based solution to achieve the Running Sum Result listed in Table 1 

 /****************************************************************
			CURSOR/LOOP	(Row-by-row) Based Soution 
*****************************************************************/

BEGIN TRY DROP TABLE #tmp_demo_order END TRY BEGIN CATCH END CATCH ; 
-- Create a temporary table
CREATE TABLE #tmp_demo_order
(	cust_id BIGINT NOT NULL,
    order_nbr BIGINT NOT NULL,
    order_date   DATETIME NOT NULL,
    order_amount DECIMAL(18,4) NOT NULL,
    runtot DECIMAL(18,4) NULL,
	CHRONO_ORDER INT NULL)
-- generate row id 
INSERT INTO #tmp_demo_order (cust_id,order_nbr,order_date,order_amount,runtot,CHRONO_ORDER)
SELECT cust_id,order_nbr,order_date,order_amount,NULL,ROW_NUMBER()OVER(PARTITION BY cust_id ORDER BY order_date) FROM dbo.demo_order ;

--    LOOP THROUGH TO GET A RUNNING SUM 
        DECLARE
                @LOOP_INT INT =1,
                @LOOP_STOP INT = (SELECT MAX(CHRONO_ORDER) FROM #tmp_demo_order) 
 
        WHILE @LOOP_INT <= @LOOP_STOP
        BEGIN
        
        UPDATE GS
        SET   GS.runtot = Q.runtot
			FROM #tmp_demo_order AS GS  
            INNER JOIN 
			  (     SELECT 
						  G.cust_id,
						  SUM(G.order_amount )AS runtot
					FROM #tmp_demo_order G
				WHERE G.CHRONO_ORDER <= @LOOP_INT
				GROUP BY G.cust_id
			 )      Q
					ON Q.cust_id = GS.cust_id
				  AND GS.CHRONO_ORDER = @LOOP_INT
                  
            SELECT
            @LOOP_INT = (@LOOP_INT + 1)
            
            END

SELECT cust_id, order_nbr, order_date, order_amount, runtot
FROM  #tmp_demo_order ORDER BY cust_id, order_date ;

BEGIN TRY DROP TABLE #tmp_demo_order END TRY BEGIN CATCH END CATCH

GO

We achieved the desired Result set in about 45 lines of code. We have captured the SQL Server IO and Time statistics coming out this code run, for review later. Figure 1 lists a part of the set of Execution plan for one of the loop runs. Note that the Actual Number of rows processed by the “Table Scan” Operator is 8. Buy summing up the Actual row count for each Table Scan operator in the entire execution plan, we arrive at the number 100.

Fig 1 Exec Plan

?Figure 1. Execution Plan for one loop run, showing properties of Table Scan operator

There are two popular set-based solutions for Calculating Running Sum in SQL Server 2005 and higher editions. I am using CTE and ROW_NUMBER OVER function to generate a sequence number for every unique combination for customer and order IDs, in all of my Set Based Solutions.

?

Set-based Inner Join Solution

The first solution involves simply Inner Joining the table to itself on customer id and comparing the sequence numbers we generated on the fly.

Code Listing 3. Set-based solution using Inner Join to achieve the Running Sum Result listed in Table 1

/*******************************************************
		SET BASED SOLUTION USING INNER JOIN	 
*******************************************************/
;
WITH CTE 
AS
(SELECT ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY cust_id , order_date) AS seq_nbr, cust_id, order_nbr, order_date, order_amount FROM dbo.demo_order )
SELECT a.cust_id, a.order_nbr, a.order_date,a.order_amount, SUM(b.order_amount) AS YTD_Running_Sum
FROM
	CTE AS a INNER JOIN CTE AS b
	ON a.cust_id = b.cust_id AND
	b.seq_nbr <= a.seq_nbr
GROUP BY
	a.cust_id, a.order_nbr, a.order_date, a.order_amount
ORDER BY a.cust_id, a.order_date ; 

We achieved the desired Result set in about 12 lines of code. We have captured the SQL Server IO and Time statistics coming out this code run, for review later. Figure 2 lists the Execution plan. Note that the Actual Number of rows processed by the “Table spool” Operator is 144. While you screenshot doesn’t show the number of reads for the “clustered index scan” operators in both branches, its 12 rows reach. So the total actual number of rows read is 156.

Fig 2 Exec Plan Spooling Op

Figure 2. Execution Plan showing properties of Table Spool operator

Set-based Correlated Sub-Query Solution

The second set-based solution involves using a Correlated sub-query:

Code Listing 4. Set based solution using Correlated sub-query to achieve the Running Sum Result listed in Table 1

/*******************************************************
	  SET BASED SOLUTION USING CORRELATED SUB-QUERY	 
*******************************************************/
;
WITH CTE 
AS
(SELECT ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY cust_id , order_date) AS seq_nbr, cust_id, order_nbr, order_date, order_amount FROM dbo.demo_order )
SELECT a.cust_id, a.order_nbr, a.order_date,a.order_amount, (SELECT SUM(b.order_amount) 
		FROM CTE AS b
		WHERE b.cust_id = a.cust_id
		AND b.seq_nbr <= a.seq_nbr) AS YTD_Running_Sum
FROM CTE AS a
ORDER BY a.cust_id, a.order_date; 

We achieved the desired Result set in about 10 lines of code. We have captured the SQL Server IO and Time statistics coming out this code run, for review later. Figure 3 lists the Execution plan. Note that the Actual Number of rows processed by the “Clustered Index Scan” Operator in the lower branch is 144 and the upper branch is 12 rows, adding to a total of 156.

Fig 3 Exec Plan Index

Figure 3. Execution Plan showing properties of Clustered Index Scan operator

 

SQL Server 2012 – OVER Clause With SUM Solution

A complete implementation of the OVER clause, was one the numerous enhancements to windowing functions introduced in SQL Server 2012.  The OVER clause can now be used with functions to compute aggregated values such as moving averages, cumulative aggregates and running totals.

Code Listing 5. Set based solution using Over Clause with SUM to achieve the Running Sum Result listed in Table 1

/*******************************************************************
		SQL 2012 Solution - OVER Clause with SUM
********************************************************************/
;
WITH CTE 
AS
(SELECT ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY cust_id , order_date) AS seq_nbr, cust_id, order_nbr, order_date, order_amount FROM dbo.demo_order )
SELECT cust_id,order_nbr,order_date,order_amount, SUM(order_amount) OVER (PARTITION BY cust_id ORDER BY seq_nbr) AS YTD_Running_Sum
FROM CTE
ORDER BY cust_id, order_date;

 

We have captured the SQL Server IO and Time statistics coming out this code run, for review later. Figure 4 lists the Execution plan. Note that the Actual Number of rows processed by the “Clustered Index Scan” Operator is 12

Fig 4 Exec Plan Clustered Index Scan

Figure 4. Execution Plan showing properties of Clustered Index Scan operator

Table 2. SQL Server IO and Time Statistics from running each script.

Calculation Method Total Scan Count Total Logical Reads Total Physical   Reads Total Elapsed Time   (ms) Actual Number of   Rows Read Number of Lines of   Code
While Loop Based Row-By-Row 9 34 0 162 100 45
Set Based Inner Join 3 51 0 61 156 12
Set Based Correlated Sub-Query 13 26 0 54 156 10
SQL Server 2012 – OVER Clause With SUM 17 75 0 36 12 8

 

Table 2 summarizes the various metrics we have captured during the execution of 4 different solutions to solve the Running Sum problem (DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS was run between each test method to get a clean start. SET STATISTICS IO ON and SET STATISTICS TIME ON was run to view TIME and IO statistics). The “Set Based Correlated Sub-Query” solution seems to be the winner if you look at “Total Logical Reads” .The “While Loop Based Row-By-Row” solution looks better when looking at the “Total Scan Count”. But the “SQL Server 2012” based solution is the winner as far as “Actual Number of Rows Read” is concerned, as well as the most important metric that end users care out “Total Elapsed Time”.

The SQL Server 2012 – OVER clause with SUM based solution achieves the result using a simple 8 line Query with an elegant execution plan. Remember our Order table had ONLY 12 rows? While both the Inner Join and Correlated Sub-Query based solution read total of 144 rows (12 times 12), the SQL Server 2012 – OVER clause with SUM based solution read ONLY 12 rows. As you try to solve the Running Sum problem in a real life scenario, these are the metrics you would want to capture and analyze, to determine the best performing solution for your environment.

In Conclusion, the SQL Server 2012 enhancements to the OVER clause offers a clean and efficient alternative solution to the problem of calculating a Running Sum.

 

References:

 

 

Related Posts

Comments

  1. Hi, these windowing function enhancements and new features available in SQL Server 2012, can be though of as a more complete implementation of Windowing functions made available with the T-SQL Language. While Oracle did implement these features in their product (SQL,PL/SQL language) earlier than Microsoft did , no database product vendor (in my knowledge) can claim credit for “invention” of this feature in ANSI Standard SQL language. The ANSI Standard SQL Langauge has several “features” that various database vendors implement in their product as per the product’s planned roadmap. There are some features in the ANSI Standard SQL Language that no database product vendor may have implemented so far at all !
    I hope this help answer your question. I will be glad to explain with more details if necessary.

Leave a Reply