Hey, where is my BLOB?

2014-08-13 - General, Series, SQL Server Internals, Storage Wednesday

The Story of the Inline LOB

SQL Server's LOB storage system is quite flexible and can adapt to different data size requirements. In fact, LOB values do not even have to live in a separate allocation unit. But, if they are not there in the space that was designed to hold LOB data, where are they instead?

An Inline LOB Storage Example

If the LOB value is small enough to fit on the data page, it is stored like any other variable size data type. Particularly, if the value is less than 24 bytes in size, it will end up almost with certainty stored with the row in the data page. This is called inline LOB storage. Let us look at an example.

[sql] CREATE TABLE dbo.LOBTable
(
id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
placeholder1_for_other_columns CHAR(8000) NOT NULL DEFAULT '',
placeholder2_for_other_columns CHAR(22) NOT NULL DEFAULT '',
LOB VARCHAR(MAX),
);
GO
INSERT INTO dbo.LOBTable(LOB)
VALUES(
'This are just 23 chars.'
);
[/sql]

This T-SQL batch creates a table with only four columns. The first is a simple IDENTITY based primary key. Two of the columns are fixed length CHAR columns. We will need that fact a little later. The fourth column will hold our LOB value.

After creating the table, the INSERT statement inserts a single row with a string of 23 characters into the LOB column.

Now we can use this query, based on the undocumented sys.dm_db_database_page_allocations DMF, to see the pages that make up our table:

[sql] SELECT OBJECT_SCHEMA_NAME(P.object_id) AS schema_name,
OBJECT_NAME(P.object_id) AS table_name,
I.name AS index_name,
P.partition_number,
DDDPA.allocation_unit_id,
DDDPA.allocation_unit_type_desc,
DDDPA.allocated_page_file_id AS file_id,
DDDPA.allocated_page_page_id AS page_id,
DDDPA.page_type,
DDDPA.page_type_desc,
DDDPA.page_level
FROM sys.dm_db_database_page_allocations(DB_ID(),OBJECT_ID('dbo.LOBTable'),NULL,NULL,'DETAILED') AS DDDPA
JOIN sys.allocation_units AS AU
ON DDDPA.allocation_unit_id = AU.allocation_unit_id
JOIN sys.partitions AS P
ON P.partition_id = AU.container_id
JOIN sys.indexes AS I
ON P.index_id = I.index_id
AND P.object_id = I.object_id
ORDER BY DDDPA.allocation_unit_id,DDDPA.allocated_page_file_id,DDDPA.allocated_page_page_id;
[/sql]

In the result of this query below you can see that the table contains only a single data page (and an IAM page) and that there is no LOB_DATA page to be seen anywhere.

A BLOB can live as inline LOB together with its row in the data page

However, the first result set, which stems from a straight SELECT * FROM our table, also shows that the value was indeed stored correctly. This means that the LOB value actually is stored in the data page.

Beware of the Pointer

Now let us try to increase the size of the string just a little:

[sql] UPDATE dbo.LOBTable SET
LOB = 'This are xactly 25 chars.';
[/sql]

Clearly, 25 characters should not make a problem in a column of a data type that can store up to 2147483647 bytes.

Well, it is not that simple:

beware of the large BLOB pointer

The reason this fails is the pointer. When the actual value is stored in the LOB_DATA allocation unit, a pointer is inserted into the row. This pointer is a surprisingly complex structure and takes up between 24 and 72 bytes.

The two fixed length character columns are sized in a way that the row takes up exactly 8060 bytes when a 23-character string is inserted into the LOB column. The above update tries to extend that to 25 characters. That would make a total row length of 8062 bytes, which is not possible. Therefore, SQL Server tries to move the LOB value out into the LOB_DATA allocation unit. But now SQL Server has to insert that pointer and with a minimum of 24 bytes it also is too big to fit in the maximum row size of 8060 bytes. That is the reason that we get above error.

If we reduce the size of the two fixed length character columns a little, the pointer will fit again and the update can succeed. For that, let us alter the table:

[sql] ALTER TABLE dbo.LOBTable ALTER COLUMN placeholder2_for_other_columns CHAR(21) NOT NULL;
ALTER TABLE dbo.LOBTable REBUILD;
[/sql]

The first ALTER TABLE statement changes the column from a CHAR(22) to a CHAR(21) data type. That gives enough room for the pointer and above update will now succeed. The second statement rebuilds the table so that the change is actually applied to all rows and not only held in the table meta data.

a misleading warning

Did you notice the warning? It is telling us that, while the current 24-byte pointer fits on the page, a larger pointer will not. However, SQL Server is actually clever enough not to fail an update because of this. It just will use additional pages in the LOB_DATA allocation unit to make up for it. So, the above warning really tells us that there is the potential for a lot of wasted space.

If you look at the pages that make up our table now, you will see a single LOB_DATA page that holds our 25-byte LOB value:

A BLOB that is actually stored in the LOB_DATA allocation unit

The table now consists of a normal DATA page together with its IAM page for the ROW_DATA allocation unit, and a single LOB page (TEXT_MIX_PAGE), also with an IAM page for its LOB_DATA allocation unit.

Summary

SQL Server's LOB storage system is extremely flexible. It can store LOB values in a separate LOB_DATA allocation unit or inside the row on the data page as an inline LOB. To achieve that flexibility, SQL Server uses pointers that use at least 24 bytes. Be aware of that as it can cause inserts and updates to fail, if the row including the pointer would take over 8060 bytes.

Categories: General, Series, SQL Server Internals, Storage Wednesday
Tags: , , , , ,

Leave a Reply