Page Type 4 – The Mysterious TEXT_TREE_PAGE

2014-07-30 - General, Series, SQL Server Internals, Storage Wednesday

Just last week we talked about type-3 pages that store LOB data. Those pages are of the TEXT_MIX_PAGE page type and they are the ones in which SQL Server stores the actual LOB data. However, there are other pages involved in storing LOB values. They are pages of type TEXT_TREE_PAGE.

A LOB Example

As soon as a LOB value like a VARCHAR(MAX) does not fit on the DATA_PAGE page of its row anymore, it is stored in a separate allocation unit in pages of type TEXT_MIX_PAGE. Let us review a quick example:

[sql] CREATE TABLE dbo.TableWithLob
(
id INT IDENTITY(1,1) CONSTRAINT [PK:dbo.TableWithLob] PRIMARY KEY CLUSTERED,
some_value INT,
some_lob VARCHAR(MAX)
);

INSERT INTO dbo.TableWithLob(some_value,some_lob)
VALUES(1,REPLICATE(CAST('X' AS VARCHAR(MAX)),8001));
[/sql]

This batch creates an example table and inserts a single row with a LOB value that is too big to fit on the main page. However, it is small enough to fit on a single LOB_DATA page. We can confirm that by running the following query that returns all pages that belong to our example table. (Use with care as it uses the undocumented sys.dm_db_database_page_allocations DMF.)

[sql] SELECT OBJECT_SCHEMA_NAME(P.object_id) AS schema_name,
OBJECT_NAME(P.object_id) AS table_name,
I.name AS index_name,
P.partition_number,
DDDPA.allocation_unit_id,
DDDPA.allocation_unit_type_desc,
DDDPA.allocated_page_file_id AS file_id,
DDDPA.allocated_page_page_id AS page_id,
DDDPA.page_type,
DDDPA.page_type_desc,
DDDPA.page_level
FROM sys.dm_db_database_page_allocations(DB_ID(),OBJECT_ID('dbo.TableWithLob'),NULL,NULL,'DETAILED') AS DDDPA
JOIN sys.allocation_units AS AU
ON DDDPA.allocation_unit_id = AU.allocation_unit_id
JOIN sys.partitions AS P
ON P.partition_id = AU.container_id
JOIN sys.indexes AS I
ON P.index_id = I.index_id
AND P.object_id = I.object_id
ORDER BY DDDPA.allocation_unit_id,DDDPA.allocated_page_file_id,DDDPA.allocated_page_page_id;
[/sql]

It returns four rows for the four pages that are part of the table:

a table with a single LOB_DATA page

There are two IAM pages, one for each of the two allocation units. (Remember, LOB data is stored in a separate allocation unit.) There is the page of type DATA_PAGE that contains data for the "normal" columns. For each LOB value that is not stored "inline" with the row, the row contains a pointer to the exact position of the value in the LOB_DATA allocation unit. That position is a page of type TEXT_MIX_PAGE.

Large LOB values

A single LOB value can be up to 2GB in size. A value that big clearly does not fit on a single 8192-byte page, not even with the best compression algorithms. That means that eventually the value grows beyond a single TEXT_MIX_PAGE page. In that case new pages of type TEXT_MIX_PAGE are added. Let us see what that actually looks like:

[sql] UPDATE dbo.TableWithLob SET
some_lob = REPLICATE(CAST('X' AS VARCHAR(MAX)),40200)
WHERE id = 1;
[/sql]

When executing the UPDATE statement and then above query to list all pages again, you will get a result like this:

a LOB value using five TEXT_MIX_PAGE pages

You can see that there are now a total of five TEXT_MIX_DATA pages.

Now let us make the LOB value just a little bit bigger and then run the page-query again:

[sql] UPDATE dbo.TableWithLob SET
some_lob = REPLICATE(CAST('X' AS VARCHAR(MAX)),40201)
WHERE id = 1;
[/sql]

Now there is a new page type in the mix:

a table with a single TEXT_TREE_PAGE page

Even though the LOB value grew by only a single character, the table gained two new pages. One is of the already known TEXT_MIX_PAGE type and the other is of type TEXT_TREE_PAGE.

The TEXT_TREE_PAGE Page Type

The reason for this additional unexpected page is, that SQL Server actually creates an index on each LOB value that is stored in the LOB_DATA allocation unit. The key for that index is simply the position of each character. That allows SQL Server to quickly find e.g. characters 1,000,006 to 1,000,294 in a large enough LOB value, without having to read the entire value into memory.

How that works in detail will be topic of another post. However, the index SQL Server builds is of the same type as a normal table index: A B+Tree. Therefore we need intermediate index pages and they are the pages of type 4, called TEXT_TREE_PAGE pages.

Summary

SQL Server creates an internal B+Tree index on each LOB value that is stored in the LOB_DATA allocation unit. The actual data can be found in TEXT_MIX_PAGES while the index tree structure is stored in pages of type TEXT_TREE_PAGE.

Categories: General, Series, SQL Server Internals, Storage Wednesday
Tags: , , , , ,

One Response to Page Type 4 – The Mysterious TEXT_TREE_PAGE

  1. Pingback: How to Overflow a BLOB Value - sqlity.net

Leave a Reply