How to Overflow a BLOB Value

2014-08-06 - General, Series, SQL Server Internals, Storage Wednesday

Microsoft advertises that you can store 2 gigabytes in a single BLOB value. But, what does this number actually mean? If you buy a 2TB hard drive, you might actually only get 1.8TB of usable space. The same picture presents itself when you are looking at thumb drives or memory cards for your camera. This is in part due to the fact, that some people call 1000 bytes a kilobyte and some people call 1024 bytes a kilobyte. The same factor applies then to get from a kilobyte to a megabyte and again to get to a gigabyte and from there to a terabyte. The difference at the terabyte level is 1024^4 – 1000^4 = 1,099,511,627,776 - 1,000,000,000,000 which is close to 100 gigabytes. But there are also other factors at play that hide behind terms like "raw-storage".

With all that storage size fuzziness in the industry, I thought it might be interesting to figure out how much room there really is in a SQL Server BLOB.

BLOB Explained

BLOB stands for Binary Large Object. Different database systems have different definitions of how large a value has to be to qualify as a BLOB value. In SQL Server, This is really more due to the data type than to actual value size. However, the largest value you might encounter that is not a BLOB is 8000 bytes in size.

The data types in SQL Server that allow for BLOB storage are VARCHAR(MAX), NVARCHAR(MAX) and VARBINARY(MAX). You might also still encounter the deprecated TEXT, NTEXT and IMAGE legacy data types.

Technically, only the VARBINARY(MAX) data type allows to store a BLOB, as that acronym is based on the word "binary". Sometimes you will find the character large objects specified as just LOB values. However, the term BLOB has been established to include any large value stored in one of the above data types.

Testing the Two Gigabyte Boundary

The most generous definition of two gigabytes is 2*1024*1024*1024 = 2,147,483,648 bytes. Let us try to create a value that size. I already created a simple table with a VARCHAR(MAX) column and inserted a single row. With the following statement, I am going to attempt to store a 2147483648-character string in that column:

[sql] UPDATE dbo.LOBTable SET
LOB = REPLICATE(CAST('x' AS VARCHAR(MAX)),2147483648);
[/sql]

The statement does not have a WHERE clause as there is only a single row in that table right now. Running this statement results in . . . . . . an error:

a value too large for a BLOB

Apparently, 2 gigabytes is too much to be stored in a SQL Server BLOB data type. However, if you look at the error message, you see that we were not far off after all. The maximum number of characters allowed is just one character less. Let us try to confirm that:

[sql] UPDATE dbo.LOBTable SET
LOB = REPLICATE(CAST('x' AS VARCHAR(MAX)),2147483647);

SELECT LEN(LT.LOB) FROM dbo.LOBTable AS LT;
[/sql]

This time we are also going to determine the length of the stored string after the update, just to make sure we did not lose any characters somewhere mysteriously. The result is very much promising:

a BLOB with almost 2GB is possible

The reason for this single missing byte is a performance optimization.

The Missing Byte

You might know that SQL Server stores every BLOB value in its own B+Tree index. That allows SQL Server to access any sub-string directly without having to read the entire value into memory. If you look at the number of pages required to store our 2147483647-byte value on disc, you see that there are 566 additional TEXT_TREE_PAGES that make up the B+Tree:

count and type of pages required to store a 2GB BLOB

The actual data is stored in 267100 pages of type TEXT_MIX_PAGE - about 8044 bytes per page.

The index key for this B+Tree index is the position of the character within the string and is expressed as a single INT value. That means that the highest position that this index can reference is the highest number that can be stored in an INT.

SQL Server does not know unsigned integers, so the largest positive value that can be stored in an INT is 2^31-1 or 2,147,483,647. Therefore, SQL Server can only allow that many bytes in a single BLOB value.

Summary

SQL Server is marketed to allow 2 gigabytes of data to be stored in any of its three BLOB data types. The actual number of bytes that can be stored is 2^31-1 = 2,147,483,647. Not coincidentally, this is also the largest number that can be held by the INT data type.

Categories: General, Series, SQL Server Internals, Storage Wednesday
Tags: , , , , ,

2 Responses to How to Overflow a BLOB Value

  1. ThomasWMarshall says:

    Your posts brooches a pair of interesting topics.
    The first of these is the number 2147483647. It is prime (particularly interesting being proven by Leonhard Euler corresponding with Daniel Bernouli two guys so important in the history of math and science that one has a number named for him and the other is the reason planes fly), a Mersenne prime, a double Mersenne prime (there are only 4 known), the phone number to Max’s pizza in Dallas (ask Brent Ozar), and the reason for Y2038 which is a Unix version of the Y2K bug that is more likely to make planes crash than the first.
    The second of these has to do with computer terminology. What we call 2 Gigabytes (GB) is actually two numbers, as you showed above. The International Electrotechnical Commission established a naming convention in 1998 that would have us call ((2^31)-1) 2 Gibibytes (GiB). Donald Knuth in The Art of Programming would implore us to call it 2 Large Gigabytes (GGB). In 2006 Western Digital settled a lawsuit by offering $30 refunds to anyone who felt they were misled into buying a larger disk provided they filled out a claim form and returned it prior to July 17th. In spite of this as we see it is still common to use Gigabytes for both Giga and Gibi bytes. This second point is particularly interesting in the context of accounting for systems resources in a manner that consistently translates for the purpose of calculating and comparing costs.

  2. @sqlity says:

    ThomasWMarshall It is indeed quite interesting, the number of “special” properties this number has. BTW, if you need to ever calculate it without leaving the INT data type you can look here: http://sqlity.net/en/858/how-to-calculate-maxint/

Leave a Reply