Choosing Between Rowstore Indexes and Columnstore Indexes

By Michael Otey on April 5, 2018

Since the columnstore index was first introduced back with SQL Server^® 2012, there have been a lot of questions about when you should use a columnstore and when you should use a traditional rowstore index. This is especially true following Microsoft’s latest enhancements to the columnstore. Microsoft touts some impressive performance gains using the columnstore indexes. For example, on their website, Microsoft states that the columnstore index can achieve up to 10x query performance gains over traditional row-oriented storage, as well as a 10x data compression rate over the uncompressed data size. The original implementation of the columnstore had some significant limitations. For example, you needed to drop the index in order to modify the data in the base table. In addition, the original implementation did not use columnar format for physical data storage. However, these old limitations have been removed and many other improvements have been made through subsequent SQL Server releases, including the ability to combine the columnstore index with In-Memory OLTP in SQL Server 2016.

Rowstore vs. Columnstore

First, for a bit of level setting, let’s look at the basic differences between rowstore and columnstore indexes.

Rowstore — The rowstore index is the traditional style that has been around since the initial release of SQL Server. Rowstore indexes are designed to speed the retrieval of data by enabling queries to quickly locate data by index rather than scanning an entire table. Rowstore data is logically organized by rows and columns, and is physically stored in row-oriented data pages. SQL Server internally organizes rowstore indexes using a B-Tree structure over the data pages. SQL Server supports clustered and nonclustered indexes. With clustered indexes, the data in the base table is organized according to the clustered index. Rowstore indexes perform best on queries that seek data by searching for a particular value or retrieving a small range of values. Rowstore indexes are a good fit for transactional workloads since these workloads tend to require table seeks instead of large-range table scans, and they often require frequent data updates.
Columnstore — The columnstore index is also logically organized as a table with rows and columns, but the data is physically stored in a column-wise data format. Columnstore indexes work well for mostly read-only queries with large data sets, like data warehousing workloads. Columnstore indexes are not well-suited for queries that seek specific individual values. Columns often contain similar data which enables the data to be highly compressed, improving memory utilization and significantly reducing disk usage. A columnstore index can be clustered or nonclustered. Clustered and nonclustered columnstore indexes function the same. The difference is that a clustered columnstore index provides the primary storage for the entire table, while a nonclustered index is a secondary index that contains a copy of some of the columns in the underlying table. A clustered columnstore index can have one or more nonclustered B-tree indexes.

Overall, rowstore indexes tend to be better for online transaction processing (OLTP) workloads, which use more update and seek operations, while columnstore indexes tend to be better for online analytical processing (OLAP) workloads, which use more read operations. Rowstore indexes tend to be better at performing random reads and writes. Columnstore indexes tend to be better for performing sequential reads and writes.

Using the Columnstore for Real-time Analytics and In-Memory OLTP

One of the most compelling attributes of SQL Server columnstore indexes is their ability to enable real-time analytics. Traditional BI and data warehousing applications are not typically built using real-time data. Instead, they are usually built via an ETL process that periodically takes OLTP data and loads it into a back-end data warehouse. This means that the decision-making data is not current, but instead is a few hours or perhaps days old. The data is only as current as the last ETL data load. The columnstore index can address this issue by accelerating query performance over real-time OLTP data, making it possible to perform BI and analytic analysis over live data with no ETL-induced delays.

Beginning with SQL Server 2016, you can also create an updatable nonclustered columnstore over In-Memory OLTP tables, potentially providing the performance required to view real-time analytics. The nonclustered columnstore index stores a copy of the chosen columns, so you do need extra space for it, but the data will be highly compressed. By combining the columnstore and In-Memory OLTP, you can run BI queries on the columnstore index and support high performance OLTP transactions at the same time. SQL Server also gives you the flexibility to have one or more nonclustered rowstore indexes on a columnstore index, enabling you to perform efficient table seeks on the underlying columnstore.

Testing out the Columnstore

Let’s take a closer look at a performance comparison of the columnstore index and row. The following Microsoft^® sample queries found on GitHub^® use the AdventureworksDW2016_EXT database to illustrate the difference in performance that you can get using the columnstore index. The first query in Listing 1 uses the rowstore index named FactResellerSalesXL_PageCompressed. The SET STATISTICS IO and TIME operations are used to record the storage usage and elapsed time required by the query.

USE [AdventureworksDW2016_EXT] GO

DBCC DROPCLEANBUFFERS

SET STATISTICS IO ON SET STATISTICS TIME ON

SELECT s.SalesTerritoryRegion,d.[CalendarYear],FirstName + ' ' + lastName as 'Employee',FORMAT(SUM(f.SalesAmount),'C') AS 'Total Sales', SUM(f.OrderQuantity) as 'Order Quantity', COUNT(distinct f.SalesOrdernumber) as 'Number of Orders', count(distinct f.Resellerkey) as 'Num of Resellers' FROM FactResellerSalesXL_PageCompressed f INNER JOIN [dbo].[DimDate] d ON f.OrderDateKey= d.Datekey INNER JOIN [dbo].[DimSalesTerritory] s on s.SalesTerritoryKey=f.SalesTerritoryKey INNER JOIN [dbo].[DimEmployee] e on e.EmployeeKey=f.EmployeeKey WHERE FullDateAlternateKey between '1/1/2005' and '1/1/2007' GROUP BY d.[CalendarYear],s.SalesTerritoryRegion,FirstName + ' ' + lastName ORDER BY SalesTerritoryRegion,CalendarYear,[Total Sales] desc

SET STATISTICS IO OFF SET STATISTICS TIME OFF

Listing 1 – Microsoft Sample Query Using a Rowstore Index

The next query is essentially identical except that it uses the FactResellerSalesXL_CCI columnstore index rather than a rowstore. The data in both tables is the same. Again, statistics are used to record the query performance.

USE [AdventureworksDW2016_EXT] GO

DBCC DROPCLEANBUFFERS

SET STATISTICS IO ON SET STATISTICS TIME ON

SELECT s.SalesTerritoryRegion,d.[CalendarYear],FirstName + ' ' + lastName as 'Employee',FORMAT(SUM(f.SalesAmount),'C') AS 'Total Sales', SUM(f.OrderQuantity) as 'Order Quantity', COUNT(distinct f.SalesOrdernumber) as 'Number of Orders', count(distinct f.Resellerkey) as 'Num of Resellers' FROM FactResellerSalesXL_CCI f INNER JOIN [dbo].[DimDate] d ON f.OrderDateKey= d.Datekey INNER JOIN [dbo].[DimSalesTerritory] s on s.SalesTerritoryKey=f.SalesTerritoryKey INNER JOIN [dbo].[DimEmployee] e on e.EmployeeKey=f.EmployeeKey WHERE FullDateAlternateKey between '1/1/2005' and '1/1/2007' GROUP BY d.[CalendarYear],s.SalesTerritoryRegion,FirstName + ' ' + lastName ORDER BY SalesTerritoryRegion,CalendarYear,[Total Sales] desc

SET STATISTICS IO OFF SET STATISTICS TIME OFF

Listing 2 – Microsoft Sample Query Using a Columnstore Index

You can see a sample of the difference in performance times and I/O shown in Listing 3.

Rowstore query 1 partial results:

DBCC execution completed. If DBCC printed error messages, contact your system administrator.

(42 rows affected) Table 'Worktable'. Scan count 3, logical reads 7099656, physical reads 0. Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0. Table 'FactResellerSalesXL_PageCompressed'. Scan count 1, logical reads 87246, physical reads 1. Table 'DimDate'. Scan count 1, logical reads 59, physical reads 0. Table 'DimEmployee'. Scan count 1, logical reads 31, physical reads 1. Table 'DimSalesTerritory'. Scan count 1, logical reads 3, physical reads 1.

SQL Server Execution Times: CPU time = 19515 ms, elapsed time = 20831 ms.

Columnstore query 2 partial results:

(42 rows affected) Table 'FactResellerSalesXL_CCI'. Scan count 1, logical reads 0, physical reads 0. Table 'FactResellerSalesXL_CCI'. Segment reads 7, segment skipped 5. Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0. Table 'DimSalesTerritory'. Scan count 1, logical reads 3, physical reads 1. Table 'DimEmployee'. Scan count 1, logical reads 31, physical reads 1. Table 'DimDate'. Scan count 1, logical reads 59, physical reads 0.

SQL Server Execution Times: CPU time = 2297 ms, elapsed time = 2803 ms.

Listing 3 – Performance and Difference of the Rowstore and Columnstore Queries Shown in Listings 1 and 2

For this example, you can see that the columnstore required less I/O and provided significantly better performance than the rowstore. More detailed information is available in the video that accompanies this article.

What if you have some slow queries that you think might benefit from using a columnstore index? It’s pretty easy to try out the columnstore index and see if it will work for you. If you create a columnstore index and it doesn’t provide the performance benefit that you expect, you can rollback by simply dropping the columnstore index. Your mileage can vary with the columnstore index and the performance you see will depend on your situation, but it can have benefits over the rowstore for certain types of queries.

Choosing Between Rowstore Indexes and Columnstore Indexes

By Michael Otey on April 5, 2018

Related Posts

Leave a Reply Cancel reply