Note: This is Part 3 of a three-part article on SQL Server primary keys and clustered indexes.
In Part 1 of this series we learned how to break apart the Clustered Index from the Primary Key. In Part 2, we saw there was little performance benefit in doing so. However, when table partitioning (especially for existing databases), separate Clustered Index and Primary Key may just be the answer.
In the real world, most large tables, for existing databases, join through foreign keys based only on an identity property in the parent table (the Primary Key). This works great from a performance aspect, as queries only need an integer column (unique) to join parent and child tables.
There are many white papers and articles on SQL Server table partitioning with academic examples (see references at end of this article), and most suggest partitioning on a natural key, like date. This is a good recommendation, but then how do you implement it when there are child tables? This article will show how to do so.
Partioning on a natural key when there are child tables: an example
In this case, we will use the container table as described in Part 2 with child tables having Foreign Key relations back to ContainerID.
The customer data retention plan called for data to be retained 6 months (since last update and/or create) and the data to be partitioned on a month-to-month basis. Data archiving and purging will be monthly just prior to the end of the month. Therefore, for this scenario, we defined a computed column LastDateKey as shown below:
ALTER TABLE dbo.Container ADD LastDateKey AS (CONVERT(INT,CONVERT(varchar(8),ISNULL(UpdateDate,CreateDate),112))) PERSISTED;
Note the PERSISTED key word. This is required so that the LastDateKey can be part of the clustered index. Also, note this is an integer column in the form of yyyymmdd. Now we can create the Partition Function and Partition Scheme:
CREATE PARTITION FUNCTION DB_PF1 (INT) AS RANGE RIGHT FOR VALUES(20120301,20120401,20120501,20120601,20120701,20120801) GO CREATE PARTITION SCHEME DB_PS1 AS PARTITION DB_PF1 ALL TO ([FileGroup01]) GO
The partition function defines 7 partitions covering the last 6 months and the partition scheme puts all the partitions in the same filegroup (FileGroup01). The RANGE RIGHT option defines the lower range boundary of each partition starting with partition 2 (the first partition has no lower boundary).
Before we can separate the Primary Key from the Clustered Index, the dependent Foreign Keys (from the child tables) must be dropped. Otherwise, you will encounter an error when trying to drop the Primary Key. Assuming this has been done, you can then run the following statements
IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'dbo.Container') AND name = N'PK_Container') ALTER TABLE dbo.Container DROP CONSTRAINT PK_Container; GO CREATE UNIQUE CLUSTERED INDEX CIDX_Container ON dbo.Container(ContainerID, LastDateKey) WITH (ONLINE = ON, DATA_COMPRESSION = ROW) on DB_PS1(LastDateKey); GO --Rebuild PK ALTER TABLE dbo.Container WITH CHECK ADD CONSTRAINT PK_Container PRIMARY KEY NONCLUSTERED (ContainerID) ON [PRIMARY]
The first TSQL statement drops the Primary Key (and its respective clustered index). The second TSQL statement creates a unique clustered index based on ContainerID and LastDateKey with row compression. The reason for this order is from the performance degradation we saw in Part 2 of this series if the LastDateKey was first (for this particular database, most table joins are on the ContainerID). The third TSQL statement rebuilds the primary key (non-clustered). It is defined on the Primary file group and is not partitioned. This is because all of the child tables have a single column Foreign Key constraint referring back to ContainerID. If you wanted a partitioned Primary Key, all of the child tables would have to be modified with the addition of the LastDateKey column. This is very undesirable for a variety of reasons – mainly that the LastDateKey would be a property (and defined) on the parent table (Container) and have no relation to the data in the child table. There is no need to have 2 columns for join criteria (from a performance viewpoint) back to the container (parent) table when the ContainerID is already unique.
Now that the Container table partitioned by month (as defined above), it is very easy to add a new month:
ALTER PARTITION SCHEME DB_PS1 NEXT USED [FileGroup01] GO ALTER PARTITION FUNCTION DB_PF1() SPLIT RANGE (20120901); GO
The NEXT USED statement is necessary to alert SQL Server to the filegroup to use for the next partition range. The SPLIT statement actually creates the new boundary. It can be run before or after that date – just be careful if the SPLIT is after the date there may be data movement within the filegroup with a resulting schema lock and possible application timeouts. The MERGE statement (not shown) can remove an old partition boundary – but, again, be careful – there also may be data movement from orphaned data or data retained because of other archiving constraints. This MERGE operation generates a schema lock and may cause application timeouts. The more data to move, the longer the locks are retained. Data movement during splits or merges can be very disruptive during your monthly partition maintenance operations — especially in a 24×7 environment.
Do not forget to add back in the foreign keys that were dropped earlier. Another tip to consider is to drop all the existing non-clustered indexes on the parent table before dropping the Primary Key and creating the new Clustered Index and new Primary Key. The non-clustered indexes can then be rebuilt (I would suggest using the ONLINE = ON option and also partitioned on the partition scheme, e.g.,
CREATE NONCLUSTERED INDEX [IDX_Container_ParentContainerID] ON dbo.Container (ParentContainerID ASC,DelFlag ASC) INCLUDE (CreateDate,ContainerTypeID,UpdateDate) WITH (ONLINE = ON, DATA_COMPRESSION = ROW) ON [NGSCoreContainerPS1](ContainerID);
Data compression is another tool for consideration, but that is a topic for another article.
Consider an index on LastDateKey; it could be useful in queries using dates or a date range.
This article should have given you some insight into SQL Server table partitioning selecting a partition key. In any case picking a partition key needs a fair amount of thought – once you do so and implement it the cost of change may be prohibitive.
- Partitioned Tables and Indexes in SQL Server 2005, Kimberly Tripp, http://msdn.microsoft.com/en-us/library/ms345146(v=sql.90).aspx
- Partitioning in SQL Server 2008, Muhammad Shujaat Siddiqi, http://www.sqlservercentral.com/articles/partition/64740/
- Easy Table Partitions with SQL Server 2008, Ravi Subramaniam, http://blogs.msdn.com/b/manisblog/archive/2009/01/18/easy-table-partitions-with-sql-server-2008.aspx
- SQL SERVER – 2005 – Database Table Partitioning Tutorial – How to Horizontal Partition Database Table, Pinal Dave, http://blog.sqlauthority.com/2008/01/25/sql-server-2005-database-table-partitioning-tutorial-how-to-horizontal-partition-database-table/
- Creating a table with horizontal partitioning in SQL Server, Atif Shehzad, http://www.mssqltips.com/sqlservertip/1796/creating-a-table-with-horizontal-partitioning-in-sql-server/