Bulk Changed Map (BCM): The Minimally Logged Advantage

2014-05-14 - General, Series, SQL Server Internals, Storage Wednesday

Introduction

SQL Server knows three recovery settings: Simple, Bulk-Logged and Full. To comply with the ACID properties, particularly the durability requirement, SQL Server uses a mechanism called Write Ahead Logging (WAL). Every change is first recorded in the database log (together with undo information) and then applied to the data pages in memory. Before a transaction can commit, the disk subsystem has to confirm that the log records were indeed written to the physical storage medium. The changes to the data pages are written to disk asynchronously and might even end up written before the transaction commits.

The information in the log allows SQL Server to recover the database after a crash. Recovery in this context means, that transactions that were committed before the crash are reapplied, if there are data pages that did not make it to disk in time. Transactions that were not yet committed are undone, if their data page changes were written before the crash. The end-result is that no matter what, the database is always in a transactionally consistent state.

The recovery setting of the database determines what exactly is logged and how long the log is kept. The crash recovery is always guaranteed, independent of this setting. However, in simple recovery mode the log data of a transaction is discarded after all pages that were changed by the transaction have been written to disk. In full recovery mode on the other hand, the log data is not discarded until a backup of the log has been performed.

This allows the DBA to create a "chain" of log backups. With such a log-chain, you can restore the database to any point in time, for example just before an accidental delete.

The bulk-logged recovery mode is similar to the full recovery mode. The difference lies in the fact that some resource intensive operations are not fully logged. Instead, only the operation itself and the page numbers of the affected pages are logged. During a log backup, those pages are added to the backup in addition to the normal log information.

Durability is still guaranteed, however. For a statement to be bulk-loggable, all its changes must be applied to new (empty) pages. The addition of those pages to say a table is fully logged in all cases. The data changes to those pages however might not. If an undo of the operation should be required during recovery, only the adding of the pages needs to be undone. As the data pages were unused beforehand, changes to those pages themselves do not need to be undone.

The big advantage is that those bulk(y) operations that normally would cause a lot of churn on the log can be executed a lot quicker. However, everything in SQL Server is a trade-off and bulk logging is no exception. As the actual data pages have to be added to the log backup, the size of the log backup can potentially increase significantly. Also, as the data pages have to be accessed during the log backup, backing up the log tail might not be possible if the data file has become corrupt.

Bulk Changed Map in Action

To be able to add the data pages that were affected by a bulk logged operation, SQL Server uses a mechanism similar to the Differential Changed Map that is used for differential backups. Data pages that need to be added to the next log backup are flagged in the Bulk Changed Map (BCM). The BCM consists of BCM pages, pages of type 17. Like the DCM pages, each one covers a single GAM Interval. The BCM page is always located in the last page of the GAM Extent, the first extent of each GAM Interval.

To show you an example, we need to create a database first and set it to recovery mode BULK_LOGGED:

[sql] CREATE DATABASE BulkLogged;
ALTER DATABASE BulkLogged SET RECOVERY BULK_LOGGED;
BACKUP DATABASE BulkLogged TO DISK = 'BulkLogged.bak';
[/sql]

The backup of this empty database will allow us later to execute a log backup. SQL Server does not allow the execution of log backups without a prior full backup.

The next step is to create a table in that database:

[sql] USE BulkLogged;
GO
CREATE TABLE dbo.ATable(id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED, some_value INT, fill CHAR(8000));
[/sql]

At this point, the Bulk Changed Map should not show any affected pages. Let us confirm:

[sql] DBCC PAGE(0,1,7,3);
[/sql]

(Remember to always set trace flag 3604 before using DBCC PAGE.) As expected, the output shows an empty BCM:

empty bulk changed map

Now we can execute a bulk logged insert statement:

[sql] INSERT INTO dbo.ATable WITH(TABLOCK) (some_value)
SELECT C.column_id FROM master.sys.columns AS C;
[/sql]

An insert statement like this is only bulked logged, when the table is a heap without indexes or if it is empty. The statement also needs to affect more than one page. (There are partially bulk logged execution modes too. Check the documentation for details.)

The example table has a filler column that will cause each row to take up an entire data page. My master database contains a little over 800 columns, so this insert statement should add that number of pages to the table. If we now look at the Bulk Changed map again, we should see about 800 flagged pages in there:

bulk changed map after bulk insert

Indeed, there are 103 extents or 824 pages flagged. If we now execute a log backup, those 824 data pages should be included:

[sql] BACKUP LOG BulkLogged TO DISK = 'BulkLogged.bak'
[/sql]

When SQL Server takes a backup, it reports how many pages were read from each file. That output confirms that 824 data pages were included in the backup:

log backup including minimally logged data pages

After the log backup, the Bulk Changed map should not contain any flagged pages anymore. Let us use DBCC PAGE one more time to confirm:

[sql] DBCC PAGE(0,1,7,3);
[/sql]

Below you can find the complete output of this statement:

[sourcecode] PAGE: (1:7)

BUFFER:

BUF @0x0000000004F66780

bpage = 0x00000001F94E0000 bhash = 0x0000000000000000 bpageno = (1:7)
bdbid = 23 breferences = 1 bcputicks = 122
bsampleCount = 2 bUse1 = 31469 bstat = 0x9
blog = 0xdb21215a bnext = 0x0000000000000000

PAGE HEADER:

Page @0x00000001F94E0000

m_pageId = (1:7) m_headerVersion = 1 m_type = 17
m_typeFlagBits = 0x0 m_level = 0 m_flagBits = 0x0
m_objId (AllocUnitId.idObj) = 99 m_indexId (AllocUnitId.idInd) = 0 Metadata: AllocUnitId = 6488064
Metadata: PartitionId = 0 Metadata: IndexId = 0 Metadata: ObjectId = 99
m_prevPage = (0:0) m_nextPage = (0:0) pminlen = 90
m_slotCnt = 2 m_freeCnt = 6 m_freeData = 8182
m_reservedCnt = 0 m_lsn = (0:0:1) m_xactReserved = 0
m_xdesId = (0:0) m_ghostRecCnt = 0 m_tornBits = 0
DB Frag ID = 1

Allocation Status

GAM (1:2) = ALLOCATED SGAM (1:3) = NOT ALLOCATED PFS (1:1) = 0x44 ALLOCATED 100_PCT_FULL
DIFF (1:6) = CHANGED ML (1:7) = NOT MIN_LOGGED

ML_MAP: Header @0x000000006C8BA064 Slot 0, Offset 96

status = 0x0

ML_MAP: Extent Alloc Status @0x000000006C8BA0C2

(1:0) - (1:384) = NOT MIN_LOGGED

DBCC execution completed. If DBCC printed error messages, contact your system administrator.
[/sourcecode]

As we expected, the BCM is indeed empty now.

Summary

SQL Server uses the Bulk Changed Map to record which pages were altered by bulk-logged operations. This information is than later used to include those actual data pages in the next log backup.

Categories: General, Series, SQL Server Internals, Storage Wednesday
Tags: , , , , , , , , ,

Leave a Reply