Partition Boundaries – Left or Right: That is the Question

2014-07-16 - Fundamentals, General, Series, SQL Server Internals, Storage Wednesday

Introduction

Partition boundaries can be set to be a left range or a right range. While this is a seemingly simple decision, it starts getting complex if you are really trying to understand what those two options mean. That is not necessarily helped by the fact that different terminologies seem to contradict each other. For example, a left range builds intervals that are right side inclusive.

Left and Right Ranges and Partition Boundaries

SQL Server partitioning uses two constructs, the partition scheme and the partition function. An un-partitioned table is placed in a filegroup during creation: CREATE TABLE … ON PRIMARY. If you are creating a table that is partitioned, instead of placing it into a filegroup, you place it on a partition scheme. The partition scheme is basically a collection of storage units called partitions, each being assigned to a filegroup. (However, multiple partitions can live in the same filegroup.)

The partition function decides which row in the table ends up in which partition based on a single column. Let us assume that that column is an INT column, say the id column. (I am using INT as an example. However, the concept is the same for other data types.) The partition function does not go and assigns blocks of values to partitions directly; instead, it splits the entirety of possible values into sections. An INT can store values from -2147483648 to 2147483647.

To split that into two sections, we have to define just one boundary. If we chose the value 3 for that first boundary, we would create two partitions: One that accepts values from -2147483648 to 3 and one for values from 3 to 2147483647. If we now add a second boundary at 6, we end up with three partitions: The first one still accepting values from -2147483648 to 3, the second for values from 3 to 6, and the last one finally for values from 6 to 2147483647.

You might have noticed an ambiguity in the preceding paragraph. The English language is particularly fuzzy on boundary conditions. The text above does not specify, which partitions the boundary values themselves belong to. Do the go to the left or the right?

Let me use a diagram to clarify:

Two possible interpretations of the partition Boundaries.

For each boundary, you have to decide if the boundary value is part of the preceding partition or part of the following. As we commonly write from left to right, you can ask that same question this way: Is the boundary value part of the partition to its left or part of the partition to its right?

SQL Server does not allow us to make that decision individually for each boundary. Instead, you decide for all boundaries in a partition function at once.

The CREATE PARTITION FUNCTION Statement

To create a partition function you have to use the CREATE PARTITION FUNCTION statement. In this statement you decide if the boundary values are in their left or right partition by specifying either RANGE LEFT or RANGE RIGHT, followed by a comma separated list of the actual boundary values:

[sql] CREATE PARTITION FUNCTION ThreeStepLeft(INT)
AS RANGE LEFT FOR VALUES(3,6);

CREATE PARTITION FUNCTION ThreeStepRight(INT)
AS RANGE RIGHT FOR VALUES(3,6);
[/sql]

Now that we have those two partition functions (using the same two boundary values) in place, we can actually compare their behavior.

RANGE LEFT vs RANGE RIGHT

First, let us finish the setup by creating two tables that are partitioned based on these functions:

[sql] CREATE PARTITION SCHEME ThreeStepLeftScheme AS PARTITION ThreeStepLeft ALL TO([PRIMARY]);

CREATE TABLE dbo.LeftTable
(
Id INT PRIMARY KEY CLUSTERED,
OtherValue INT DEFAULT CHECKSUM(NEWID())
) ON ThreeStepLeftScheme(id);

INSERT INTO dbo.LeftTable(id) VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9);

--

CREATE PARTITION SCHEME ThreeStepRightScheme AS PARTITION ThreeStepRight ALL TO([PRIMARY]);

CREATE TABLE dbo.RightTable
(
Id INT PRIMARY KEY CLUSTERED,
OtherValue INT DEFAULT CHECKSUM(NEWID())
) ON ThreeStepRightScheme(id);

INSERT INTO dbo.RightTable(id) VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9);
[/sql]

The above SQL snippet defines a partition scheme for each of the two partition functions. The schemes are of the most simple type, assigning all partitions to the primary filegroup. The snippet also creates a table for each partition scheme and inserts rows with id values ranging from 1 to 9.

Two years ago, I published a query that lists all partitions in a database together with their boundaries. Below you can find an improved version of that query:

[sql] SELECT f.NAME AS file_group_name,
SCHEMA_NAME(t.schema_id) AS table_schema,
t.name AS table_name,
p.partition_number,
ISNULL(CAST(left_prv.value AS VARCHAR(MAX))+ CASE WHEN pf.boundary_value_on_right = 0 THEN ' < '
ELSE ' <= '
END , '-INF < ')
+ 'X' + ISNULL(CASE WHEN pf.boundary_value_on_right = 0 THEN ' <= '
ELSE ' < '
END + CAST(right_prv.value AS NVARCHAR(MAX)), ' < INF') AS range_desc,
pf.boundary_value_on_right,
ps.name AS partition_schem_name,
pf.name AS partition_function_name,
left_prv.value AS left_boundary,
right_prv.value AS right_boundary
FROM sys.partitions p
JOIN sys.tables t
ON p.object_id = t.object_id
JOIN sys.indexes i
ON p.object_id = i.object_id
AND p.index_id = i.index_id
JOIN sys.allocation_units au
ON p.hobt_id = au.container_id
JOIN sys.filegroups f
ON au.data_space_id = f.data_space_id
LEFT JOIN sys.partition_schemes ps
ON ps.data_space_id = i.data_space_id
LEFT JOIN sys.partition_functions pf
ON ps.function_id = pf.function_id
LEFT JOIN sys.partition_range_values left_prv
ON left_prv.function_id = ps.function_id
AND left_prv.boundary_id + 1 = p.partition_number
LEFT JOIN sys.partition_range_values right_prv
ON right_prv.function_id = ps.function_id
AND right_prv.boundary_id = p.partition_number
[/sql]

Filtered to include only those two tables we created before, the query returns this result:

A verbose sys.partitions Query

There are a few more columns in there that did not fit on the screen, but the more important ones are included. I would like you to look at partition_number, range_desc and boundary_value_on_right for a second. The latter is set to 1 is the partition function was created with the RANGE RIGHT clause, which means that each boundary belongs to the partition on its right. The range_desc column explains what values will be stored in each partition.

I would ask you to trust me on that query, but I know you rather see proof that the range description is actually, what happens under the covers.

Determining the Partition in Which a Particular Row is Stored

There is no documented way to directly determine the partition a particular table row is stored in. There is a way to pass any value to a partition function and receive a (partition) number in return. However, that does not really tell us that the row actually was stored in that partition.

Luckily, if we combine two pieces of undocumented functionality, we can still see which partition a particular row is stored in. Those two functions are the %%physloc%% virtual column that returns the file_id and page_id of the page the row is stored in and the sys.dm_db_database_page_allocations DMF that lists all the database pages that belong to a partition.

Together these two provide enough information to determine the actual partition of a row in a partitioned table. However, before we can use them we need to execute two more steps. First we need to translate the output of the %%physloc%% column into separate values for the file and page ids. For that we can use the sys.fn_PhysLocCracker function. Second, we do not know the partition of our row upfront, so we need to get the full list of pages to compare with. That can be achieved by passing in NULL to all parameters but the first of the sys.dm_db_database_page_allocations DMF. With that, it returns all pages in the specified database.

Putting this all together, we can write a query like the following:

[sql] SELECT T.*,
DDDPA.partition_id AS partition_number,
ISNULL
(
CAST(LEFT_PRV.value AS VARCHAR(MAX)) +
CASE WHEN PF.boundary_value_on_right = 0 THEN ' < ' ELSE ' <= ' END ,
'-INF < '
) +
'X' +
ISNULL
(
CASE WHEN PF.boundary_value_on_right = 0 THEN ' <= ' ELSE ' < ' END +
CAST(RIGHT_PRV.value AS NVARCHAR(MAX)),
' < INF'
) AS range_desc,
PF.boundary_value_on_right
FROM dbo.LeftTable AS T
CROSS APPLY sys.fn_PhysLocCracker(T.%%physloc%%) AS FPLC
JOIN sys.dm_db_database_page_allocations(DB_ID(),NULL,NULL,NULL,NULL) AS DDDPA
ON DDDPA.allocated_page_page_id = FPLC.page_id
AND DDDPA.allocated_page_file_id = FPLC.file_id
JOIN sys.indexes AS I
ON DDDPA.object_id = I.object_id
AND DDDPA.index_id = I.index_id
LEFT JOIN sys.partition_schemes PS
ON PS.data_space_id = I.data_space_id
LEFT JOIN sys.partition_functions pf
ON ps.function_id = pf.function_id
LEFT JOIN sys.partition_range_values left_prv
ON left_prv.function_id = ps.function_id
AND left_prv.boundary_id + 1 = DDDPA.partition_id
LEFT JOIN sys.partition_range_values right_prv
ON right_prv.function_id = ps.function_id
AND right_prv.boundary_id = DDDPA.partition_id;
[/sql]

This query, in addition to determining the actual partition of a row, also uses part of the previous query to create the range_desc column of the identified partition for easier comparison. The important column however is DDDPA.partition_id, aliased as partition_number. That is the column that returns the partition number for each row.

If you want to use this query on a different table, all you have to do is replace the single occurrence of dbo.LeftTable with the new table name. The two screenshots below show the output of this query for both the dbo.LeftTable and the dbo.RightTable.

Row to Partition match for dbo.LeftTable

Row to Partition match for dbo.RightTable

As you can now clearly see, when looking at the id and the partition_number columns, the output in the range_desc column is accurate.

The Take-Away

Now that you, I hope, understand partition boundaries a little better, let use recap what we found out.

  1. A partition function cuts the entire range of possible values in "partitions", based on boundary values. The number of partitions is always one higher than the number of boundary values.
  2. Each boundary separates two partitions from each other: A left partition with smaller values and a right partition with larger values.
  3. Each boundary value for a single partition function belongs either to the partition to its left or the partition to its right. Which one is depending on the partition function having been created with RANGE LEFT or RANGE RIGHT respectively.

Categories: Fundamentals, General, Series, SQL Server Internals, Storage Wednesday
Tags: , , ,

3 Responses to Partition Boundaries – Left or Right: That is the Question

Leave a Reply