Before we dig into the lesser known but important facts, I want to provide little background about the SQL Server Columnstore index. This feature was introduced in SQL Server 2012, but remains largely unknown. As the name suggests, a data page will store values only for the given column to which that page is allocated. Generally, rows are storage in a page, and are called as RowStore. The ColumnStore index feature is designed to optimize access to large data warehouses workloads.
Here is a typical RowStore pages. Let us say we have 2 columns and each page would fit 5 rows; in this case we would have data for all columns and 5 rows in each page.
Columnstore indexes store data column-wise, and each page stores data from a single column. If we have ColumnStore, then the page would look like below.
Each column would be accessed independently and SQL Server would fetch only the needed columns based on the query. You can imagine that this can dramatically decrease I/O performance.
If we compare the two images above and perform Select sum(Column1) from Table then we won’t unnecessarily bring Column2 data in memory.
Read/Write Operations in SQL Server 2014
When the feature was released in SQL Server 2012, there were many limitations. One of the major limitations was that, once we created a ColumnStore index on a table, it became read-only. Also, SQL Server 2012 introduced only non-clustered ColumnStore indexes. In SQL Server 2014, Microsoft supported both clustered and non-clustered ColumnStore indexes. In SQL Server 2014, the clustered ColumnStore index has no key columns, which means that all columns are included columns. This also means that the clustered ColumnStore index is updatable. We can add data to an existing clustered ColumnStore index by using bcp, SSIS, and INSERT … SELECT — they can all load data into a clustered ColumnStore index. Clustered ColumnStore indexes introduced something called deltastore in order to prevent fragmentation of column segments in the columnstore.
Here is the code to understand this.
SET NOCOUNT ON GO CREATE DATABASE CCI GO USE CCI GO CREATE TABLE MyTable( ProductKey [int] NOT NULL, OrderDateKey [int] NOT NULL, DueDateKey [int] NOT NULL, ShipDateKey [int] NOT NULL); GO CREATE CLUSTERED COLUMNSTORE INDEX cci_MyTable ON MyTable GO INSERT INTO MyTable VALUES (1, 20150701,20150710, 20150709) GO SELECT * FROM MyTable
Notice that I run the above code block only in SQL Server 2014 and above. There are many enhancements in SQL Server 2014, and they are documented very well in books online.
Hints to force use of ColumnStore Index
There are a few situations where a developer wants to override the decision made by the optimizer. There are hints available in SQL Server which can cause the optimizer to forcefully use the ColumnStore index. Before we take this approach, we need to make sure that we use the hint sparingly and understand the implications of using it. Here is a demo to show the hint IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX. If you want to use the script, you need to download Sample Database from codeplex http://msftdbprodsamples.codeplex.com/releases
CREATE TABLE DemoSalesOrderDetail ( SalesOrderID int ,SalesOrderDetailID int ,CarrierTrackingNumber nvarchar(25) NULL ,OrderQty smallint ,ProductID int ,SpecialOfferID int ,UnitPrice money ,UnitPriceDiscount money ,LineTotal numeric(38, 6) ,rowguid uniqueidentifier ,ModifiedDate datetime ) GO CREATE CLUSTERED INDEX CL_DemoSalesOrderDetail ON DemoSalesOrderDetail (SalesOrderDetailID) GO
Now we would populate table with a lot of records. This is where you need the sample database. In AdventureWorks database there is a table called Sales.SalesOrderDetail and we will use that.
INSERT INTO DemoSalesOrderDetail SELECT * FROM Sales.SalesOrderDetail GO 100
The code shown would take little time to run based on machine’s hardware configuration. Now we need to create a nonclustered ColumnStore index. Remember, the Read/Write is only for clustered ColumnStore index.
CREATE NONCLUSTERED COLUMNSTORE INDEX IX_CS_DemoSalesOrderDetail ON DemoSalesOrderDetail (UnitPrice, OrderQty, ProductID)
Next, execute the queries shown below. Note that second one is having hint to ignore the ColumnStore index.
SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice, SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty FROM DemoSalesOrderDetail GROUP BY ProductID ORDER BY ProductID GO SELECT ProductID, SUM(UnitPrice) SumUnitPrice, AVG(UnitPrice) AvgUnitPrice, SUM(OrderQty) SumOrderQty, AVG(OrderQty) AvgOrderQty FROM DemoSalesOrderDetail GROUP BY ProductID ORDER BY ProductID OPTION (IGNORE_NONCLUSTERED_COLUMNSTORE_INDEX) GO
We have shown example where forcing to non-use of ColumnStore index is causing sub-optimal query performance (the cost is 98% vs 2%). This is done purposefully to demonstrate that we need to make sure that we test the query with and without the hint, and then select the best option after review.