Generating a Running or Cumulative Sum Report, usually in a Year-To-Date (YTD), Quarterly, or Monthly format, is a fairly common requirement in most reporting systems or ad-hoc report requests. There are several ways of Calculating a Running Sum in SQL Server, depending on the version of SQL Server. In this article, we will explore the various options available in SQL Server 2012 to help us Calculate Running Sums, and evaluate the performance metrics of each.
While T-SQL is best at SET based operations, cursors are available in the language. I have come across cursor based solution for calculating running sums, as well as several versions of SET based solutions. SQL Server 2012 added several enhancements to windowing functions, including a more complete implementation of the OVER Clause, which helps us efficiently calculate Running Sums.
Code Listing 1: Create Schema objects and Data for Demonstration
CREATE TABLE demo_order ( cust_id BIGINT NOT NULL, order_nbr BIGINT NOT NULL, order_date DATETIME NOT NULL, order_amount DECIMAL(18,4) NOT NULL) ; ALTER TABLE demo_order ADD CONSTRAINT pk_demo_order PRIMARY KEY CLUSTERED (order_nbr) ; INSERT INTO demo_order (cust_id,order_nbr,order_date,order_amount) VALUES (21,1,'2013-11-01',22.2), (21,3,'2013-10-11',22.8), (21,7,'2013-09-01',23.2), (22,11,'2013-11-02',42.3), (22,6,'2013-08-01',252.2), (22,12,'2013-08-21',52.2), (31,15,'2013-06-01',62.3), (31,31,'2013-11-03',72.2), (31,61,'2013-08-01',21.2), (521,71,'2013-06-01',26.2), (521,77,'2013-10-18',29.2), (521,45,'2013-09-01',1.2) ; GO
Our Goal is to generate a report that calculates the running sum of each Customer’s order amount, calculated and displayed in sequence of the Customer’s order date. The end result should look like the table below.
Table 1. The final result of Running Sum Calculation on Orders Data
While Loop Based Row-By-Row Solution
The Procedural way of going about this task would be using a cursor or loop based (row-by-row) solution. This code will work in any version of SQL Server starting 2005 or later. I am using the ROW_NUMBER OVER function to generate sequence a number for every unique combination for customer and order IDs.
Code Listing 2: Cursor/Loop based solution to achieve the Running Sum Result listed in Table 1
/**************************************************************** CURSOR/LOOP (Row-by-row) Based Soution *****************************************************************/ BEGIN TRY DROP TABLE #tmp_demo_order END TRY BEGIN CATCH END CATCH ; -- Create a temporary table CREATE TABLE #tmp_demo_order ( cust_id BIGINT NOT NULL, order_nbr BIGINT NOT NULL, order_date DATETIME NOT NULL, order_amount DECIMAL(18,4) NOT NULL, runtot DECIMAL(18,4) NULL, CHRONO_ORDER INT NULL) -- generate row id INSERT INTO #tmp_demo_order (cust_id,order_nbr,order_date,order_amount,runtot,CHRONO_ORDER) SELECT cust_id,order_nbr,order_date,order_amount,NULL,ROW_NUMBER()OVER(PARTITION BY cust_id ORDER BY order_date) FROM dbo.demo_order ; -- LOOP THROUGH TO GET A RUNNING SUM DECLARE @LOOP_INT INT =1, @LOOP_STOP INT = (SELECT MAX(CHRONO_ORDER) FROM #tmp_demo_order) WHILE @LOOP_INT <= @LOOP_STOP BEGIN UPDATE GS SET GS.runtot = Q.runtot FROM #tmp_demo_order AS GS INNER JOIN ( SELECT G.cust_id, SUM(G.order_amount )AS runtot FROM #tmp_demo_order G WHERE G.CHRONO_ORDER <= @LOOP_INT GROUP BY G.cust_id ) Q ON Q.cust_id = GS.cust_id AND GS.CHRONO_ORDER = @LOOP_INT SELECT @LOOP_INT = (@LOOP_INT + 1) END SELECT cust_id, order_nbr, order_date, order_amount, runtot FROM #tmp_demo_order ORDER BY cust_id, order_date ; BEGIN TRY DROP TABLE #tmp_demo_order END TRY BEGIN CATCH END CATCH GO
We achieved the desired Result set in about 45 lines of code. We have captured the SQL Server IO and Time statistics coming out this code run, for review later. Figure 1 lists a part of the set of Execution plan for one of the loop runs. Note that the Actual Number of rows processed by the “Table Scan” Operator is 8. Buy summing up the Actual row count for each Table Scan operator in the entire execution plan, we arrive at the number 100.
?Figure 1. Execution Plan for one loop run, showing properties of Table Scan operator
There are two popular set-based solutions for Calculating Running Sum in SQL Server 2005 and higher editions. I am using CTE and ROW_NUMBER OVER function to generate a sequence number for every unique combination for customer and order IDs, in all of my Set Based Solutions.
?
Set-based Inner Join Solution
The first solution involves simply Inner Joining the table to itself on customer id and comparing the sequence numbers we generated on the fly.
Code Listing 3. Set-based solution using Inner Join to achieve the Running Sum Result listed in Table 1
/******************************************************* SET BASED SOLUTION USING INNER JOIN *******************************************************/ ; WITH CTE AS (SELECT ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY cust_id , order_date) AS seq_nbr, cust_id, order_nbr, order_date, order_amount FROM dbo.demo_order ) SELECT a.cust_id, a.order_nbr, a.order_date,a.order_amount, SUM(b.order_amount) AS YTD_Running_Sum FROM CTE AS a INNER JOIN CTE AS b ON a.cust_id = b.cust_id AND b.seq_nbr <= a.seq_nbr GROUP BY a.cust_id, a.order_nbr, a.order_date, a.order_amount ORDER BY a.cust_id, a.order_date ;
We achieved the desired Result set in about 12 lines of code. We have captured the SQL Server IO and Time statistics coming out this code run, for review later. Figure 2 lists the Execution plan. Note that the Actual Number of rows processed by the “Table spool” Operator is 144. While you screenshot doesn’t show the number of reads for the “clustered index scan” operators in both branches, its 12 rows reach. So the total actual number of rows read is 156.
Figure 2. Execution Plan showing properties of Table Spool operator
Set-based Correlated Sub-Query Solution
The second set-based solution involves using a Correlated sub-query:
Code Listing 4. Set based solution using Correlated sub-query to achieve the Running Sum Result listed in Table 1
/******************************************************* SET BASED SOLUTION USING CORRELATED SUB-QUERY *******************************************************/ ; WITH CTE AS (SELECT ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY cust_id , order_date) AS seq_nbr, cust_id, order_nbr, order_date, order_amount FROM dbo.demo_order ) SELECT a.cust_id, a.order_nbr, a.order_date,a.order_amount, (SELECT SUM(b.order_amount) FROM CTE AS b WHERE b.cust_id = a.cust_id AND b.seq_nbr <= a.seq_nbr) AS YTD_Running_Sum FROM CTE AS a ORDER BY a.cust_id, a.order_date;
We achieved the desired Result set in about 10 lines of code. We have captured the SQL Server IO and Time statistics coming out this code run, for review later. Figure 3 lists the Execution plan. Note that the Actual Number of rows processed by the “Clustered Index Scan” Operator in the lower branch is 144 and the upper branch is 12 rows, adding to a total of 156.
Figure 3. Execution Plan showing properties of Clustered Index Scan operator
SQL Server 2012 – OVER Clause With SUM Solution
A complete implementation of the OVER clause, was one the numerous enhancements to windowing functions introduced in SQL Server 2012. The OVER clause can now be used with functions to compute aggregated values such as moving averages, cumulative aggregates and running totals.
Code Listing 5. Set based solution using Over Clause with SUM to achieve the Running Sum Result listed in Table 1
/******************************************************************* SQL 2012 Solution - OVER Clause with SUM ********************************************************************/ ; WITH CTE AS (SELECT ROW_NUMBER() OVER (PARTITION BY cust_id ORDER BY cust_id , order_date) AS seq_nbr, cust_id, order_nbr, order_date, order_amount FROM dbo.demo_order ) SELECT cust_id,order_nbr,order_date,order_amount, SUM(order_amount) OVER (PARTITION BY cust_id ORDER BY seq_nbr) AS YTD_Running_Sum FROM CTE ORDER BY cust_id, order_date;
We have captured the SQL Server IO and Time statistics coming out this code run, for review later. Figure 4 lists the Execution plan. Note that the Actual Number of rows processed by the “Clustered Index Scan” Operator is 12
Figure 4. Execution Plan showing properties of Clustered Index Scan operator
Table 2. SQL Server IO and Time Statistics from running each script.
Calculation Method | Total Scan Count | Total Logical Reads | Total Physical Reads | Total Elapsed Time (ms) | Actual Number of Rows Read | Number of Lines of Code |
While Loop Based Row-By-Row | 9 | 34 | 0 | 162 | 100 | 45 |
Set Based Inner Join | 3 | 51 | 0 | 61 | 156 | 12 |
Set Based Correlated Sub-Query | 13 | 26 | 0 | 54 | 156 | 10 |
SQL Server 2012 – OVER Clause With SUM | 17 | 75 | 0 | 36 | 12 | 8 |
Table 2 summarizes the various metrics we have captured during the execution of 4 different solutions to solve the Running Sum problem (DBCC FREEPROCCACHE and DBCC DROPCLEANBUFFERS was run between each test method to get a clean start. SET STATISTICS IO ON and SET STATISTICS TIME ON was run to view TIME and IO statistics). The “Set Based Correlated Sub-Query” solution seems to be the winner if you look at “Total Logical Reads” .The “While Loop Based Row-By-Row” solution looks better when looking at the “Total Scan Count”. But the “SQL Server 2012” based solution is the winner as far as “Actual Number of Rows Read” is concerned, as well as the most important metric that end users care out “Total Elapsed Time”.
The SQL Server 2012 – OVER clause with SUM based solution achieves the result using a simple 8 line Query with an elegant execution plan. Remember our Order table had ONLY 12 rows? While both the Inner Join and Correlated Sub-Query based solution read total of 144 rows (12 times 12), the SQL Server 2012 – OVER clause with SUM based solution read ONLY 12 rows. As you try to solve the Running Sum problem in a real life scenario, these are the metrics you would want to capture and analyze, to determine the best performing solution for your environment.
In Conclusion, the SQL Server 2012 enhancements to the OVER clause offers a clean and efficient alternative solution to the problem of calculating a Running Sum.
References:
- OVER Clause – http://technet.microsoft.com/en-us/library/ms189461.aspx
- Microsoft Connect Item for full implementation of OVER Clause – http://connect.microsoft.com/SQLServer/feedback/details/254392/over-clause-enhancement-request-rows-and-range-window-sub-clauses
- SUM function – http://msdn.microsoft.com/en-us/library/ms187810(v=sql.110).aspx
mdsaghi says
Hi , in Oracle there was sum over queries.
is this a copy of that? or have some new features?
sanil.mhatre says
Hi, these windowing function enhancements and new features available in SQL Server 2012, can be though of as a more complete implementation of Windowing functions made available with the T-SQL Language. While Oracle did implement these features in their product (SQL,PL/SQL language) earlier than Microsoft did , no database product vendor (in my knowledge) can claim credit for “invention” of this feature in ANSI Standard SQL language. The ANSI Standard SQL Langauge has several “features” that various database vendors implement in their product as per the product’s planned roadmap. There are some features in the ANSI Standard SQL Language that no database product vendor may have implemented so far at all !
I hope this help answer your question. I will be glad to explain with more details if necessary.