SQL Server Internals: Tempdb Part 1

on April 30, 2018

SQL Server has been around for over 30 years, and I’ve been working with SQL Server for almost as long. I’ve seen a lot of changes over the years (and decades!) and versions of this incredible product. In these posts I’ll share with you how I look at some of the features or aspects of SQL Server, sometimes along with a bit of historical perspective.

Some people want to call tempdb the ‘temporary database’ and I usually cringe when I hear that. You might be able to make an argument for the database being temporary, but the ‘temp’ actually refers to the tables that are created within the database. You can create local or global temporary tables, as well as local or global temporary procedures. When and why you might want to create these objects is not the topic of this post. In this post and the next, I’m going to answer four questions about tempdb.

What is tempdb used for?
What’s different between tempdb and other databases?
How big should tempdb be?
How many data files should the tempdb database use?

I’ll cover the first two questions in this post, and the last two next time.

What is tempdb used for?

Three types of objects are stored in tempdb: user objects, internal objects, and the version store, used primarily for snapshot isolation for disk-based tables (as opposed to snapshot isolation for memory-optimized tables.)

Temporary user tables and stored procedures
Work tables, sort space
Version store

User Objects

I’m assuming you’re already aware of the difference between local and global temporary tables, and local and global temporary stored procedures. You can read the documentation if you’re not, but this post won’t be covering those details. There are a couple of issues with user objects that you might not be familiar with.

Although you can create a temp table from any database by prefacing its name with # or ##, you can also create a temp table explicitly in the tempdb database that acts (mostly) like your standard temp tables. In fact, the demo code that I’ll show you shortly, does just that. So what’s the difference between a temp table prefaced with # and one created in the tempdb database? The only difference is that the table you create explicitly in tempdb will not automatically be dropped when the connection that created it is removed. It will disappear when SQL Server is shut down, because the entire tempdb database is dropped and recreated. So if you want a table that is available to all connections, even when the creating connection is gone, but that won’t be recreated every time SQL Server is restarted, you can consider this option.

Another misunderstood temp table feature is table variables. For almost all intents and purposes, table variables are exactly the same as temporary tables. They are not a memory-only feature (unless of course you’re using memory-optimized tables.) Tables variables are stored in the tempdb database just like temp tables. The biggest difference is that the lifetime of a table variable is even shorter than for a temp table. Like any variables in SQL Server, the table variable must be declared, populated and used all in a single batch. And because the batch must be compiled and optimized before it’s executed, and thus before the table variable is even created, there are other differences. Parallel plans will not be generated because the optimizer can’t know how many rows there will be in table variable. Statistics will not be maintained and updated, again because the optimizer will never see the data in the table.

Internal Objects

Internal objects in tempdb are worktables that SQL Server creates while it’s processing your queries. This can be for spooling operations, for sort space or for hash tables. You really don’t have a lot of direct control over when and how these internals objects are created, but you may have indirect control. For example, if you find a query is running slowly and its plan shows the use of spooling operators, you might be able to avoid the spooling by explicitly creating temp tables. If a slow-running query is performing hash joins or hash aggregation, you can either build appropriate indexes to encourage the optimizer to choose a non-hash operator, or you could use hints to force the optimizer to choose a different plan.

Version Store

The third use of tempdb is for storing row versions when using READ COMMITTED SNAPSHOT or SNAPSHOT ISOLATION level with disk-based tables. Either of these flavors of SNAPSHOT saves older ‘versions’ of rows as updates are made, so you can see the prior committed values even if the data is locked. SNAPSHOT isolation and the version store is huge topic, and I’ll be writing blog posts dedicated to this topic in the future, so I’m not going to go into any detail now.

What’s different between tempdb and other databases?

One difference already mentioned is that tempdb is recreated every time SQL Server starts up. Some people, myself included, use this fact to quickly determine the last time a SQL Server instance was restarted. You can just check the create_date for tempdb. In fact, I have a little store procedure to give me this date and time, that I created in the master database, and because it starts with sp_, I can call it from anywhere. I can quickly execute it to find my SQL Server start time.

USE master;
GO

CREATE PROC sp_sqlstart AS
SELECT create_date AS StartTime
FROM sys.databases
WHERE name = ‘tempdb’;
GO

EXEC sp_sqlstart;
GO

Another difference between tempdb and other databases is a caching behavior introduced in SQL Server 2005. It is frequently the case that the same temporary objects can be created over and over, as in the case of a stored procedure that creates a temp table. If a temp table has 5 columns, every time it was created it would need to insert one row in the system table holding object information, five rows in the system table holding column information, and perhaps rows into tables for indexes and/or constraints. Having to reallocate the space for these rows repeatedly can be quite resource intensive. So, starting in SQL Server 2005, when a temporary user object is created, SQL Server caches one page for the IAM and one data page, and associates these pages with the query plan. So when the same query is executed, there is some space already available and no allocation may be needed. For work tables in tempdb, SQL Server caches an IAM page and a full extent of 8 pages.

You can see how much space is being used for each of tempdb’s uses by looking at the DMV sys.dm_db_file_space_usage. Prior to SQL Server 2014, this view would always show you space used in tempdb, but as of 2014, it will show information for your current database, unless you use a three-part name:

SELECT file_id,
user_object_reserved_page_count,
internal_object_reserved_page_count,
version_store_reserved_page_count
FROM tempdb.sys.dm_db_file_space_usage;
GO

The above query can even show you space used by tempdb for an Azure SQL Database, even though you cannot directly access tempdb itself. Temp tables, worktables and versions can still be created and you may need to know how much tempdb space is being used!

Finally, I want to mention probably the biggest difference, and that is the logging that is done. Because of the fact that tempdb is recreated on every start up, there is never any need to run recovery and redo transactions from the log. This doesn’t mean no logging of course, but it means potentially a lot less logging. In a user database, SQL Server has to log enough information to undo a transaction in case of a rollback, but it also needs to log information to recreate the work done when needed, during a recovery operation. Since tempdb is never recovered, the REDO information does not need to be logged.

The following example illustrates the difference in logging. I am running SQL Server 2017. My results will be shown using Microsoft’s SQLServer Management Studio 17.5.

First I’ll create a table in tempdb. After inserting 10 rows, I look at the contents of the log. (For more details about reading the log contents, take a look at this post: https://logicalread.com/sql-server-dbcc-log-command-tl01/ )

USE tempdb;
SET NOCOUNT ON;
GO

DROP TABLE IF EXISTS tempdblogging;
GO

CREATE TABLE tempdblogging
(c1 int, c2 char(1000));
GO

— load 10 rows

DECLARE @i int;
SELECT @i = 0;
WHILE @i < 10 BEGIN
INSERT INTO tempdblogging
SELECT @i, REPLICATE(‘a’, 1000);
SELECT @i = @i + 1;
END;
GO

SELECT TOP 10 operation, context,
[log record length]
FROM fn_dblog(null, null)
WHERE allocunitname = ‘dbo.tempdblogging’
ORDER BY 3 DESC;
GO

Notice in the output below that the log records are very small, even though the rows inserted are over 1000 bytes long. The log really only needs to keep information about which rows which inserted, so they can be removed in case of a rollback. It doesn’t need the contents of the rows.

Now let’s see how an UPDATE is different:

UPDATE tempdblogging
SET c2 = REPLICATE(‘b’, 1000);
GO

SELECT TOP 10 operation, context,
[log record length]
FROM fn_dblog(null, null)
WHERE allocunitname = ‘dbo.tempdblogging’
ORDER BY 3 DESC;
GO

In the case of an UPDATE, rolling back the transaction would revert to the previous contents of the row, so we need to store the old data. The log rows are then just a bit over 1000 bytes.

Now, we’ll do the exact same operations in a user database. Feel free to use any test database you have available, or create a new one.

USE testdb;
GO
SET NOCOUNT ON;
GO
DROP TABLE IF EXISTS logging;
GO
CREATE TABLE logging
(c1 int, c2 char(1000));
GO
— load 10 rows
DECLARE @i int;
SELECT @i = 0;
WHILE @i < 10 BEGIN
INSERT INTO logging
SELECT @i, REPLICATE(‘a’, 1000);
SELECT @i = @i + 1;
END;
GO

SELECT TOP 10 operation, context,
[log record length]
FROM fn_dblog(null, null)
WHERE allocunitname = ‘dbo.logging’
ORDER BY 3 DESC;
GO

Note that after an INSERT in a user database, the full row is stored in the log. If the database needs to be recovered, the data is available.

UPDATE logging
SET c2 = REPLICATE(‘b’, 1000);
GO

SELECT TOP 10 operation, context,
[log record length]
FROM fn_dblog(null, null)
WHERE allocunitname = ‘dbo.logging’
ORDER BY 3 DESC;
GO

After an UPDATE, the log records are twice as big, because the log keeps track of both the old version and the new version of the rows.

Next time, I’ll continue looking at tempdb details, and talk about sizing information, details about the tempdb files and controlling allocations.