SQL Server Foreign Keys: Some of the Mystery Resolved! (Part 1)
Note: This is Part 1 of a two-part article on SQL Server foreign keys You may read Part 2 here.
Foreign keys have a bad reputation; they cause performance issues, they inhibit some inserts, updates, deletes, etc. This may be true, but let’s further explore why we even have them and how they can benefit us and maybe even improve query performance.
Foreign keys are all about data consistency. They exist to help the DBA maintain data consistency from a database level. They do not ensure data accuracy, but do ensure that data is consistent between tables. First, let’s define Foreign Keys. A Foreign Key is a constraint between 2 tables (for naming purposes let’s consider a Parent table and a Child table). The Parent table contains the actual constraint. The constraint then is a relationship between the values in the applicable column(s) in the parent table with what already exists in the child table. Frequently the child tables are lookup values for an attribute within the parent table. For example, consider the following (very simplified) tables:
CREATE TABLE dbo.Person ( PersonID INT NOT NULL IDENTITY(1,1) PRIMARY KEY, LastName VARCHAR(50) NOT NULL, FirstName VARCHAR(50), AddressID INT) CREATE TABLE dbo.[Address] ( ADDRESSID INT NOT NULL IDENTITY(1,1) PRIMARY KEY, Address1 VARCHAR(50), City VARCHAR(50), [State] VARCHAR(10), ZIP VARCHAR(10))
You could just create a single table Person also containing all the address fields, but then you have a much wider table and possibly duplicate data when you have two persons living at the same address (example below). From a pure data issue these two tables would be all you need, but there are no checks currently to ensure that the AddressID in the dbo.Person table is valid – this is what a Foreign Key does – it ensures that any AddressID values in the dbo.Person table has a valid value in the associated dbo.Address table, i.e., data consistency. And while a developer will always tell you that they can maintain data consistency in the application, how many times have you encountered orphaned data and/or non-related data because of the corresponding application? As a DBA, Foreign Keys are your protection. Be very careful on disabling or dropping them – especially when the developer says they are causing performance issues.
Foreign Keys generally refer to the Primary Key in the child table, but also can refer to a unique column set in the child table. The only difference would be that the unique column set could have a null value – and we know Primary Keys cannot have a null value.
Sometime data in the Parent table has no corresponding table in the child table; in this example a Person may not have an address. If the AddressID is nullable in dbo.Person (which it is in the above example) then when address data is not available for a Person, the AddressID could be left null. However, a better practice would be to make the AddressID column in dbo.Person non-nullable and to assign a value in dbo.Address that has a value of “Not Assigned” and/or “Unknown”. This alleviates the complexity of dealing with nulls in subsequent queries and reports. Frequently these special cases have negative Primary Keys as shown below:
SET IDENTITY_INSERT dbo.ADDRESS ON INSERT dbo.Address (AddressID,Address1,City,[State],Zip) SELECT -2,'Unassigned','Unassigned','NA','NA' UNION SELECT -1,'Unknown','Unknown','NA','NA' SET IDENTITY_INSERT dbo.ADDRESS OFF DBCC CHECKIDENT ("dbo.Address");
Since AddressID is an IDENTITY column, you need to use the IDENTITY_INSERT expression to insert a special value. And most importantly you need to turn it off after the INSERT — since only one IDENTITY_INSERT is allowed on for a database. As a standard practice, also consider doing a DBCC CHECKIDENT after using an IDENTITY_INSERT to ensure that the IDENTITY property has the next correct identity value.
As a means to further examine how to enable Foreign Keys, let’s use the code examples above plus the following code:
ALTER TABLE dbo.Person ADD CONSTRAINT FK_Person_AddressID FOREIGN KEY (AddressID) REFERENCES dbo.Address(AddressID) GO CREATE INDEX IDX_Person_Address ON dbo.Person (AddressID); GO
The first statement above creates the Foreign Key with name FK_Person_AddressID for the parent table dbo.Person and referencing the child table dbo.Address and the column AddressID (Primary Key for the dbo.Address table). The second statement creates a non-clustered index on the dbo.Person table referencing the Address column in that table. Later on we’ll find this really helps query performance if a certain attribute is set for the Foreign Key.
INSERT dbo.Address (Address1,City,[State],Zip) SELECT '123 Easy St','Austin','TX','78701' UNION SELECT '456 Lakeview','Sunrise Beach','TX','78643' INSERT dbo.Person (LastName,FirstName,AddressID) SELECT 'Smith','John',1 UNION SELECT 'Smith','Mary',1 UNION SELECT 'Jones','Max',2
Note that we have two rows in dbo.Person that refer back to a single row in dbo.Address – we have removed duplicate data!
There is a system view sys.Foreign_Keys that shows us the attributes of the database foreign keys:
SELECT name,object_id, parent_object_id,type,referenced_object_id,is_disabled, is_not_trusted,is_not_for_replication FROM [sys].[foreign_keys] WHERE name = 'FK_Person_AddressID';
This gives us
There are other columns, but these are the ones I want to discuss. Name is obvious. Object_ID refers to the ID in the sys.objects view for this Foreign Key. Parent_Object_ID is the ID for the dbo.Person table in sys.objects and referenced_object_id is the ID for dbo.Address. is_disabled and is_not_trusted are two very important Foreign Key attributes. The is_not_for_replication is another attribute, but will not be discussed in this article (it only applies to the replication environment). When the is_disabled attribute has a value of 1 (true), SQL Server does no Foreign Key checking when new rows are inserted into the parent table. The converse is true when the is_disabled attribute has a value of 0, i.e., any row inserted in the parent table has to have a valid value in the Foreign Key referenced column(s) from the child table. The is_not_trusted attribute (only Microsoft would use a double negative) is very important to the query optimizer and when is 0 (false) can have significant impact on improved query performance. But, let’s look at examples to see their use.
Note in the above result from sys.foreign_key that is_disabled = 0 (meaning the FK is enabled) and that is_not_trusted = 0 (meaning the FK is trusted).
To disable (but not drop) a Foreign Key we use the statement
ALTER TABLE dbo.Person NOCHECK CONSTRAINT FK_Person_AddressID
SELECT is_disabled, is_not_trusted FROM [sys].[foreign_keys] WHERE name = 'FK_Person_AddressID'; -- note is_disabled = 1 -- note is_not_trusted = 1
We get is_disabled = 1 and is_not_trusted = 1. Now the Foreign Key is disabled and is not trusted. So if we run following
ALTER TABLE dbo.Person CHECK CONSTRAINT FK_Person_AddressID
And then rerun above sys.foreign_keys query we get is_disabled = 0 but is_not_trusted = 1. Whoa, what happened here? We will discuss this behavior (and how to fix it) in Part 2 of this series. Hopefully this has given you some insight into how to set up Foreign Keys. Part 2 will get into more of the how and care of maintenance of Foreign Keys. They are not necessarily create and forget!