The Page – How SQL Server Stores Data

2014-02-12 - General, Series, SQL Server Internals, Storage Wednesday

Storage Wednesday

This post is the first of an ongoing series of posts about the internals of the SQL Server Storage Engine. Here we will look at how SQL Server stores data, how it is retrieved, how table data is organized on disk and similar topics. This post is going to serve as an anchor post. At the end of this post, you will find a link to each post that has been published within the Storage Wednesday series.

Introduction

A SQL Server database is organized in pages. Everything from table data, indexes, large object and database metadata is organized to fit in these virtual containers. There are several page types, depending on the type of data stored within, but everything SQL Server stores in a data files is broken down into page size chunks.

This posts will start a lose series of blog posts describing how SQL Server stores the data. We will talk about pages, allocation units, indexes, blob storage, index allocation maps, boot pages and other topics.

This particular post will feature a list of all page related posts that have been published already. This list will be updated with every new post.

The Unique Page Address

Free Stock Photos: Book Picture. Image: 191998

Every page in SQL Server has the same size: 8192 bytes. Every page also has a unique address. Foremost, a page is a part of a database and therefore linked to that database. To be exact, a page is part of a single database file. The file id therefore makes for the first part of the unique address. Within each file, pages are continuously numbered starting at 0.

So, to uniquely identify a page, you need to know the database it belongs to. Pages without database context don't exist. Once the database is identified, the page address is usually written in the form: <file number>:<page number>

That means 1:0 identifies the first page in file 1 and 3:26 identifies the 27th page in file 3. As every page is exactly the same size, the page number can also be used to calculate the absolute position of the page in the data file.

DBCC PAGE

To make all this page business a little more tangible, we are going to actually look at pages throughout the following posts. There are basically two ways to do that:

  1. Detach the database, open the file with a hex editor, calculate the position of the page within the file and then look at the data at that position.
  2. Use DBCC PAGE.

While the first method might be more fun (and depending on the hex editor you are using you might not even have to detach the database), the second method has a significant advantage: DBCC PAGE not only displays the page contents, it also helps with the interpretation of the data.

But DBCC PAGE has also a significant disadvantage: It is not officially supported by Microsoft. That means, its behavior might change any time and the feature might even disappear in its entirety without warning. So don't use it in a production environment.

Let's look at how to use it. First you need to turn on traceflag 3604. That is a general requirement to see output of any undocumented DBCC command. The easiest way to turn this traceflag on for the current session is to run the following command:

[sql] DBCC TRACEON(3604);
[/sql]

The syntax of the DBCC PAGE is as follows:

[sql] DBCC PAGE(0,1,0,3);
[/sql]

It takes four parameters. The first one is the database id. If you pass in 0 as in the example above, the current database is used. The second parameter is the file id of the file that contains the page. It is the first section of the unique page address discussed above. The third parameter is the page number or the second part of the unique page address. The fourth parameter at last tells DBCC PAGE how much detail to return. A 0 means just the header information will be returned. 1 includes the hex dump of the relevant parts of the page. 2 includes the hex dump of the entire page. 3 is like 1, but it additionally includes an interpretation of the data. For example, on data pages rows are identified and column values are spelled out while on header pages the different header values are identified.

Because output type 3 makes understanding the page data easiest, that type will be used in most examples.

Summary

All data stored in a SQL Server Database is organized in 8192 byte pages. Within each database every page has a unique address made up out of the file number and the page number within that file. We can use the undocumented DBCC PAGE command to look at pages directly.

Page Posts

This is the root post of a series of page related posts that I will write over time. Below is a list of all the posts that have been completed already.

Categories: General, Series, SQL Server Internals, Storage Wednesday
Tags: , , , , , ,