SQL Server Foreign Keys: Some of the Mystery Explained! (Part 1)

By: Mike Byrd


SQL Server Foreign Keys: Some of the Mystery Resolved! (Part 1)

Note: This is Part 1 of a two-part article on SQL Server foreign keys You may read Part 2 here.

Foreign keys have a bad reputation; they cause performance issues, they inhibit some inserts, updates, deletes, etc. This may be true, but let’s further explore why we even have them and how they can benefit us and maybe even improve query performance.

Foreign keys are all about data consistency. They exist to help the DBA maintain data consistency from a database level. They do not ensure data accuracy, but do ensure that data is consistent between tables. First, let’s define Foreign Keys. A Foreign Key is a constraint between 2 tables (for naming purposes let’s consider a Parent table and a Child table). The Parent table contains the actual constraint. The constraint then is a relationship between the values in the applicable column(s) in the parent table with what already exists in the child table. Frequently the child tables are lookup values for an attribute within the parent table. For example, consider the following (very simplified) tables:

CREATE TABLE dbo.Person (
       PersonID    INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
       LastName    VARCHAR(50) NOT NULL,
     FirstName   VARCHAR(50),
      AddressID   INT)

CREATE TABLE dbo.[Address] (
      ADDRESSID   INT NOT NULL IDENTITY(1,1) PRIMARY KEY,
       Address1    VARCHAR(50),
      City        VARCHAR(50),
      [State]     VARCHAR(10),
      ZIP     VARCHAR(10))

You could just create a single table Person also containing all the address fields, but then you have a much wider table and possibly duplicate data when you have two persons living at the same address (example below). From a pure data issue these two tables would be all you need, but there are no checks currently to ensure that the AddressID in the dbo.Person table is valid – this is what a Foreign Key does – it ensures that any AddressID values in the dbo.Person table has a valid value in the associated dbo.Address table, i.e., data consistency. And while a developer will always tell you that they can maintain data consistency in the application, how many times have you encountered orphaned data and/or non-related data because of the corresponding application? As a DBA, Foreign Keys are your protection. Be very careful on disabling or dropping them – especially when the developer says they are causing performance issues.

Foreign Keys generally refer to the Primary Key in the child table, but also can refer to a unique column set in the child table. The only difference would be that the unique column set could have a null value – and we know Primary Keys cannot have a null value.

Sometime data in the Parent table has no corresponding table in the child table; in this example a Person may not have an address. If the AddressID is nullable in dbo.Person (which it is in the above example) then when address data is not available for a Person, the AddressID could be left null. However, a better practice would be to make the AddressID column in dbo.Person non-nullable and to assign a value in dbo.Address that has a value of “Not Assigned” and/or “Unknown”. This alleviates the complexity of dealing with nulls in subsequent queries and reports. Frequently these special cases have negative Primary Keys as shown below:

SET IDENTITY_INSERT dbo.ADDRESS ON
INSERT dbo.Address (AddressID,Address1,City,[State],Zip)
        SELECT -2,'Unassigned','Unassigned','NA','NA'
     UNION
        SELECT -1,'Unknown','Unknown','NA','NA'
SET IDENTITY_INSERT dbo.ADDRESS OFF
DBCC CHECKIDENT ("dbo.Address");

Since AddressID is an IDENTITY column, you need to use the IDENTITY_INSERT expression to insert a special value. And most importantly you need to turn it off after the INSERT — since only one IDENTITY_INSERT is allowed on for a database. As a standard practice, also consider doing a DBCC CHECKIDENT after using an IDENTITY_INSERT to ensure that the IDENTITY property has the next correct identity value.

As a means to further examine how to enable Foreign Keys, let’s use the code examples above plus the following code:

ALTER TABLE dbo.Person ADD CONSTRAINT FK_Person_AddressID FOREIGN KEY (AddressID)
REFERENCES dbo.Address(AddressID)
GO
CREATE INDEX IDX_Person_Address ON dbo.Person (AddressID);
GO

The first statement above creates the Foreign Key with name FK_Person_AddressID for the parent table dbo.Person and referencing the child table dbo.Address and the column AddressID (Primary Key for the dbo.Address table). The second statement creates a non-clustered index on the dbo.Person table referencing the Address column in that table. Later on we’ll find this really helps query performance if a certain attribute is set for the Foreign Key.

INSERT dbo.Address (Address1,City,[State],Zip)
  SELECT '123 Easy St','Austin','TX','78701'
    UNION
 SELECT '456 Lakeview','Sunrise Beach','TX','78643'

INSERT dbo.Person (LastName,FirstName,AddressID)
    SELECT 'Smith','John',1
   UNION
 SELECT 'Smith','Mary',1
   UNION
 SELECT 'Jones','Max',2

Note that we have two rows in dbo.Person that refer back to a single row in dbo.Address – we have removed duplicate data!

There is a system view sys.Foreign_Keys that shows us the attributes of the database foreign keys:

SELECT name,object_id, parent_object_id,type,referenced_object_id,is_disabled,
     is_not_trusted,is_not_for_replication
    FROM [sys].[foreign_keys] WHERE name = 'FK_Person_AddressID';

This gives us

name object_id parent_object_id type referenced_object_id is_disabled is_not_trusted is_not_for_replication
FK_Person_AddressID 949578421 245575913 F 277576027 0 0 0

There are other columns, but these are the ones I want to discuss. Name is obvious. Object_ID refers to the ID in the sys.objects view for this Foreign Key. Parent_Object_ID is the ID for the dbo.Person table in sys.objects and referenced_object_id is the ID for dbo.Address. is_disabled and is_not_trusted are two very important Foreign Key attributes. The is_not_for_replication is another attribute, but will not be discussed in this article (it only applies to the replication environment). When the is_disabled attribute has a value of 1 (true), SQL Server does no Foreign Key checking when new rows are inserted into the parent table. The converse is true when the is_disabled attribute has a value of 0, i.e., any row inserted in the parent table has to have a valid value in the Foreign Key referenced column(s) from the child table. The is_not_trusted attribute (only Microsoft would use a double negative) is very important to the query optimizer and when is 0 (false) can have significant impact on improved query performance. But, let’s look at examples to see their use.

Note in the above result from sys.foreign_key that is_disabled = 0 (meaning the FK is enabled) and that is_not_trusted = 0 (meaning the FK is trusted).

To disable (but not drop) a Foreign Key we use the statement

ALTER TABLE dbo.Person NOCHECK CONSTRAINT FK_Person_AddressID

Using

SELECT is_disabled, is_not_trusted
FROM [sys].[foreign_keys] WHERE name = 'FK_Person_AddressID';
-- note is_disabled = 1
-- note is_not_trusted = 1

We get is_disabled = 1 and is_not_trusted = 1. Now the Foreign Key is disabled and is not trusted. So if we run following

ALTER TABLE dbo.Person CHECK CONSTRAINT FK_Person_AddressID

And then rerun above sys.foreign_keys query we get is_disabled = 0 but is_not_trusted = 1. Whoa, what happened here? We will discuss this behavior (and how to fix it) in Part 2 of this series. Hopefully this has given you some insight into how to set up Foreign Keys. Part 2 will get into more of the how and care of maintenance of Foreign Keys. They are not necessarily create and forget!

Trackbacks

Leave a Reply