SQL Server Internals: Peeking Inside the Transaction Log Structure

on February 22, 2018


SQL Server® has been around for over 30 years, and I’ve been working with it for almost as long. I’ve seen a lot of changes over the years (and decades!) to this incredible product. In these posts, I’ll share with you how I look at some of the features or aspects of SQL Server, sometimes with a bit of historical perspective.

In this post, I’m going to give you some of the juicy tidbits about working with the transaction log that I share with my SQL Server Internals classes. In part one, I’ll tell you about the VLFs, and in part two, I’ll tell you about why shrinking is such a problematic issue.

SQL Server divides your physical log into multiple management chunks called Virtual Log Files, or VLFs. (The issue of how SQL Server determines how many of them exist, and how big they are, is not part of this post’s topics.) The graphic below shows six VLFs in a physical log file. This would be the .ldf file that you create for your database. SQL Server keeps track of the beginning of the oldest active transaction, which is what the min LSN, or Log Sequence Number, is referring to. Everything between the min LSN and the current position SQL Server is writing to (the end of the log) is considered the ‘active’ log. Active transactions include more than just open transactions. The earliest active transaction may be a transaction marked for replication that has not yet been processed, the beginning of a log backup operation, or the beginning of an internal diagnostic scan that SQL Server performs periodically.

For 20 years, since the days of SQL Server 7.0, the main tool we used to look at our VLFs was an undocumented DBCC command called DBCC LOGINFO, which returns one row for every VLF. Although it is technically undocumented, many people blogged about it over the years. Most of these writers included a picture something like the one below, which is the one I use in my classes and workshops.

As of SQL Server 2017, there is a new documented tool that provides the same information as DBCC LOGINFO, and a bit more.  It’s a function called sys.dm_db_log_info(). You can read about it here. The nice thing about having a table valued function is that we can query the results and limit the columns in the output to only what we’re interested in. I realize a lot of people aren’t running SQL Server 2017, so at the end of this post, I’ll show how you can capture DBCC LOGINFO output into a table for querying.

I want to tell you how to think about the state of each VLF. The metadata returned by DBCC LOGINFO or sys.dm_db_log_info includes a column called status (or vlf_status) with a value of 2, meaning ‘active.’ But some people find the term ‘active’ a bit confusing, so I define four different ‘states’ for a VLF.

  1. Active

As shown in the figure above, the active portion of the log begins at the minimum LSN representing an active transaction. The active portion of the log ends at the last LSN written. Any VLFs that contain any part of the active log are considered active VLFs.

  1. Recoverable

The portion of the log preceding the oldest active transaction is needed only to maintain a sequence of log backups for restoring the database to a former state.

  1. Reusable

If transaction log backups are not being maintained or if you have already backed up the log, VLFs before the oldest active transaction are not needed and can be reused. Truncating or backing up the transaction log will change recoverable VLFs into reusable VLFs.

  1. Unused

One or more VLFs at the physical end of the log files might not have been used yet if not enough logged activity has taken place, or if earlier VLFs have been marked as reusable and then reused before SQL Server uses the ones at the end.

Any VLF that contains any part of the active log is considered active. In my graphic, that would be four of my six VLFs, the second through fifth ones. But what is the state of the first and the last ones?

If a VLF is not part of the active log, do we still need it? The answer is, of course, “It depends.” It depends on whether we’ve backed up the log or not. If we have not backed up the log, we still need to keep the VLF around, and then it is in what I am calling recoverable, or state 2. The graphic above doesn’t indicate whether the first VLF has been backed up or not, so we can’t tell if it’s in state 2 or state 3.

What about the last VLF in the graphic? It might have been used already, so could be in either the recoverable or the reusable state. Or it could be completely unused. (Since we start writing at the physical beginning of the file, there is no way the first VLF could be unused.)

What can you tell about the states of the VLFs from the tools DBCC LOGINFO and sys.dm_db_log_info? The tools do not distinguish between my states 1 and 2. If a VLF cannot be overwritten, it is called ‘active’ and shows a status value of 2 in both DBCC LOGINFO and sys.dm_db_log_info. But these two states behave differently, so I want to differentiate them. We can’t tell from the metadata if a VLF with status 2 contains active transactions or whether it’s waiting to be backed up. The only way to tell is to make a backup, and if the VLF wasn’t active, its status will change to 0, which means the VLF is now Reusable. Technically, it’s not the backing up that causes the status to change. It’s the fact that when the backup is finished, SQL Server performs a TRUNCATE LOG operation. TRUNCATE LOG changes all VLFs that were in the Recoverable state into the Reusable state.

Of course, there is the situation where we don’t care about log backups and we’re never going to make one. In that case, we can put the database into SIMPLE recovery model, and the log will be truncated every time a checkpoint occurs.  Again, that will change all Recoverable VLFs to Reusable.

There is a way to tell an unused VLF, however. If a VLF has never been used, it will have a File Sequence Number of 0. File Sequence Numbers are assigned when a VLF is used, and a new value is assigned when it is reused. The number is not static. But an unused VLF will have a File Sequence Number of 0 until the first time it gets used. In DBCC LOGINFO, this value is in the column FSeqNo, and in sys.dm_db_log_info, the column is vlf_sequence_number.

There are a few other differences between the old tool and the new. In DBCC LOGINFO, the size of the VLFs is shown in bytes, but in sys.dm_db_log_info, the size is shown in MB. The new tool also has a column that shows the first LSN used in each VLF, but that is a topic for another time.

Once you have the VLF info in a tabular form, you can query it. For example, you could count how many VLFs you have. Or you could see what percentage were Reusable. A slightly more complex query could determine where in the log the highest File Sequence Number was, which would give you an idea of how ‘shrinkable’ the log is. (We’ll see more about shrinking in part two of this series.) The new function in SQL Server 2017 returns tabular results and is nice and queryable. The following query counts the number of VLFs and returns the names of the databases with more than 100 VLFs.

SELECT [name], COUNT(l.database_id) AS 'vlf_count'
FROM sys.databases s
CROSS APPLY sys.dm_db_log_info(s.database_id) l
GROUP BY [name]
HAVING COUNT(l.database_id) > 100;

If you’re not running SQL Server 2017, you can use the following script to create a table that can capture the output of DBCC LOGINFO.

CREATE TABLE sp_LOGINFO
(RecoveryUnitId tinyint,
FileId tinyint,
FileSize bigint,
StartOffset bigint,
FSeqNo int,
Status tinyint,
Parity tinyint,
CreateLSN numeric(25,0) );
GO

Then you can insert into the table with the following code:

TRUNCATE TABLE sp_LOGINFO;
INSERT INTO sp_LOGINFO
EXEC ('DBCC LOGINFO');
GO

As a DBA, or anyone responsible for any data in a database, becoming comfortable with DBCC LOGINFO or the new sys.dm_db_log_info will go a long way toward helping you understand what is happening in your transaction logs. This knowledge can then help you manage logs and keep them in optimal health.

Related Posts

Leave a Reply