Just last week we talked about type-3 pages that store LOB data. Those pages are of the TEXT_MIX_PAGE page type and they are the ones in which SQL Server stores the actual LOB data. However, there are other pages involved in storing LOB values. They are pages of type TEXT_TREE_PAGE.
As soon as a LOB value like a VARCHAR(MAX) does not fit on the DATA_PAGE page of its row anymore, it is stored in a separate allocation unit in pages of type TEXT_MIX_PAGE. Let us review a quick example:
INSERT INTO dbo.TableWithLob(some_value,some_lob)
VALUES(1,REPLICATE(CAST('X' AS VARCHAR(MAX)),8001));
This batch creates an example table and inserts a single row with a LOB value that is too big to fit on the main page. However, it is small enough to fit on a single LOB_DATA page. We can confirm that by running the following query that returns all pages that belong to our example table. (Use with care as it uses the undocumented sys.dm_db_database_page_allocations DMF.)
It returns four rows for the four pages that are part of the table:
There are two IAM pages, one for each of the two allocation units. (Remember, LOB data is stored in a separate allocation unit.) There is the page of type DATA_PAGE that contains data for the "normal" columns. For each LOB value that is not stored "inline" with the row, the row contains a pointer to the exact position of the value in the LOB_DATA allocation unit. That position is a page of type TEXT_MIX_PAGE.
A single LOB value can be up to 2GB in size. A value that big clearly does not fit on a single 8192-byte page, not even with the best compression algorithms. That means that eventually the value grows beyond a single TEXT_MIX_PAGE page. In that case new pages of type TEXT_MIX_PAGE are added. Let us see what that actually looks like:
When executing the UPDATE statement and then above query to list all pages again, you will get a result like this:
You can see that there are now a total of five TEXT_MIX_DATA pages.
Now let us make the LOB value just a little bit bigger and then run the page-query again:
Now there is a new page type in the mix:
Even though the LOB value grew by only a single character, the table gained two new pages. One is of the already known TEXT_MIX_PAGE type and the other is of type TEXT_TREE_PAGE.
The reason for this additional unexpected page is, that SQL Server actually creates an index on each LOB value that is stored in the LOB_DATA allocation unit. The key for that index is simply the position of each character. That allows SQL Server to quickly find e.g. characters 1,000,006 to 1,000,294 in a large enough LOB value, without having to read the entire value into memory.
How that works in detail will be topic of another post. However, the index SQL Server builds is of the same type as a normal table index: A B+Tree. Therefore we need intermediate index pages and they are the pages of type 4, called TEXT_TREE_PAGE pages.
SQL Server creates an internal B+Tree index on each LOB value that is stored in the LOB_DATA allocation unit. The actual data can be found in TEXT_MIX_PAGES while the index tree structure is stored in pages of type TEXT_TREE_PAGE.