The Performance Impact of Forwarded Records on Table Scans

2012-01-31 - General, Performance, SQL Server Internals

Forwarded records are a mechanism that SQL Server employs to reduce the amount of maintenance work for indexes on a table without a clustered index. During an update of a record in a table that does not have a clustered index, an increase in size might prevent the record to fit on its current page. If that happens, a forwarding pointer is created in the current page and the record is moved to a new location.

It is well known that such forwarded records will cause each RID-Lookup operation that encounters one to have to read a second page. If you are doing a filtered index scan followed by an RID-Lookup this can potentially double the amount of work necessary to retrieve all rows.

What is less well known is the fact, that such forwarded records can significantly impact the performance of a simple table scan as well. In this article I will show you why this happens.

Terminology

To explain the way SQL Server handles forwarded records during a table scan, I am first going to rehash some storage basics.

SQL Server knows two ways to organize the data in a table on disk. One is the Clustered Index and the other one the Heap. In a Clustered Index data is organized in a B+Tree structure allowing for fast access based on the Clustered Index Key. In a Heap on the other hand the data is not sorted or linked in any way. Any new row is inserted into the table in any free space big enough to hold the row.

In both cases the smallest unit of storage and access is a Page. Every piece of data SQL Server needs to store on disk is organized in blocks called pages. Each page is exactly 8192 Bytes in size. The only exception to this rule are the log files. Log data is stored in a structure called virtual log file and those are not part of today's discussion.

Indexes created on a Heap use a physical address to point from the index record to the underlying data record. This address consists of the file number, the number of the page that contains the record within that file and the slot number inside that page.

Forwarded Records

When a record with variable length data type columns is updated, its size might increase. If the page it is stored in is filled already, the updated record might need to be moved to a new page. If that happens, all pointers in any index on that table addressing the moved row now point to the wrong (old) place. There are two ways to handle this situation: Update all indexes with the new location or put a special record in the old place that "forwards" to the new location.

Going through all existing indexes can be extensively resource intensive. For that reason SQL Server is using the second option to put a forwarding record in the old place.

Such a forwarding record links to the new physical address of the actual data. The new record also contains a pointer back to the forwarding record. This back-pointer allows the forwarding record to get updated in case the data record has to move again or in case it gets deleted which causes the forwarding record to be deleted as well. Updating the forwarding records in cases of repeated data record moves prevents long forwarding chains. Instead there is always only at max on hop.

In the case of an RID-Lookup SQL Server has to follow those forwarding records to get to the actual data record. This causes an additional (logical) page read for each forwarding record encountered.

During a table scan on the other hand each page will be read anyway so SQL Server could just ignore all forwarding records. That however could cause reading inconsistencies. Rows that are updated after the scan started and before it finishes could be missed entirely by the scan, if the record gets moved to a page that was scanned already. For that reason SQL Server follows each forwarding record immediately. This not only causes an additional read right there, is also turns the sequential read of all pages in allocation order into a succession of random reads, hurting performance potentially even more.

Demonstration

To demonstrate this behavior let us set up two tables dbo.ForwardSmall and dbo.NoForwardSmall as well as a Procedure dbo.ChangeRow to help with the creation of forwarded records:

[sql] IF OBJECT_ID('dbo.ForwardSmall') IS NOT NULL DROP TABLE dbo.ForwardSmall;
CREATE TABLE dbo.ForwardSmall(Id INT ,F VARCHAR(8000))
GO
IF OBJECT_ID('dbo.NoForwardSmall') IS NOT NULL DROP TABLE dbo.NoForwardSmall;
CREATE TABLE dbo.NoForwardSmall(Id INT ,F VARCHAR(8000))
GO

IF OBJECT_ID('dbo.ChangeRow') IS NOT NULL DROP PROCEDURE dbo.ChangeRow;
GO
CREATE PROCEDURE dbo.ChangeRow
@Id INT,
@Size INT
AS
BEGIN
SET NOCOUNT ON;
MERGE dbo.ForwardSmall fs
USING (SELECT @Id,REPLICATE('X',@Size)) x(Id,F)
ON fs.Id = x.Id
WHEN MATCHED THEN
UPDATE SET fs.F = x.F
WHEN NOT MATCHED THEN
INSERT (Id,F)VALUES(x.Id,x.F);
END
GO
[/sql]

The procedure will take an Id parameter as well as a size parameter. It uses to merge command to either insert or update the record with the given Id value. The F column will be valued with a string of length @Size.

The following script will create 16 rows in the table dbo.ForwardSmall and then update those records to create forwarded records.

[sql] EXEC dbo.ChangeRow 11,900;
EXEC dbo.ChangeRow 12,900;
EXEC dbo.ChangeRow 13,900;
EXEC dbo.ChangeRow 14,900;
EXEC dbo.ChangeRow 15,900;
EXEC dbo.ChangeRow 16,900;
EXEC dbo.ChangeRow 17,900;
EXEC dbo.ChangeRow 18,900;

EXEC dbo.ChangeRow 21,900;
EXEC dbo.ChangeRow 22,900;
EXEC dbo.ChangeRow 23,900;
EXEC dbo.ChangeRow 24,900;
EXEC dbo.ChangeRow 25,900;
EXEC dbo.ChangeRow 26,900;
EXEC dbo.ChangeRow 27,900;
EXEC dbo.ChangeRow 28,900;
--------------------------
EXEC dbo.ChangeRow 27,0;
EXEC dbo.ChangeRow 28,0;
EXEC dbo.ChangeRow 18,1800;
EXEC dbo.ChangeRow 28,900;
EXEC dbo.ChangeRow 18,900;

EXEC dbo.ChangeRow 26,0;
EXEC dbo.ChangeRow 17,1800;
EXEC dbo.ChangeRow 27,900;
EXEC dbo.ChangeRow 17,900;

EXEC dbo.ChangeRow 25,0;
EXEC dbo.ChangeRow 16,1800;
EXEC dbo.ChangeRow 26,900;
EXEC dbo.ChangeRow 16,900;

EXEC dbo.ChangeRow 24,0;
EXEC dbo.ChangeRow 15,1800;
EXEC dbo.ChangeRow 25,900;
EXEC dbo.ChangeRow 15,900;

EXEC dbo.ChangeRow 23,0;
EXEC dbo.ChangeRow 14,1800;
EXEC dbo.ChangeRow 24,900;
EXEC dbo.ChangeRow 14,900;

EXEC dbo.ChangeRow 22,0;
EXEC dbo.ChangeRow 13,1800;
EXEC dbo.ChangeRow 23,900;
EXEC dbo.ChangeRow 13,900;

EXEC dbo.ChangeRow 21,0;
EXEC dbo.ChangeRow 12,1800;
EXEC dbo.ChangeRow 22,1499;
EXEC dbo.ChangeRow 12,780;

EXEC dbo.ChangeRow 11,910;

EXEC dbo.ChangeRow 22,900;
EXEC dbo.ChangeRow 21,910;
[/sql]

Each record starts out with ~950 bytes, so the 16 records will fit into two pages. From there the records are updated in a way so that each of them ends up on the other page with a forwarding record in its old place.

Now lets put the exact same data into the dbo.NoForwardSmall table:

[sql] INSERT INTO dbo.NoForwardSmall
SELECT * FROM dbo.ForwardSmall;
[/sql]

This again requires two pages, but this time no forwarded records are created.

To confirm, we can use the following query using the sys.dm_db_index_physical_stats DMF:

[sql] SELECT OBJECT_SCHEMA_NAME(object_id)+'.'+OBJECT_NAME(object_id) Tbl,index_type_desc,alloc_unit_type_desc,page_count,record_count,forwarded_record_count
FROM sys.dm_db_index_physical_stats(DB_ID(),NULL,NULL,NULL,'detailed') ps
WHERE object_id IN (OBJECT_ID('dbo.ForwardSmall'),OBJECT_ID('dbo.NoForwardSmall'));
[/sql]

This query returns the number of pages, the total number of records and the number of forwarded records for each table:

Tbl index_type_desc alloc_unit_type_desc page_count record_count forwarded_record_count
dbo.ForwardSmall HEAP IN_ROW_DATA 2 32 16
dbo.NoForwardSmall HEAP IN_ROW_DATA 2 16 0

Now let us look at what happens during a table scan:

[sql] SET STATISTICS IO ON;
SELECT * FROM dbo.NoForwardSmall;
GO
SELECT * FROM dbo.ForwardSmall;
SET STATISTICS IO OFF;
[/sql]

This produces the following output:

(16 row(s) affected)
Table 'NoForwardSmall'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(16 row(s) affected)
Table 'ForwardSmall'. Scan count 1, logical reads 18, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Scanning the dbo.NoForwardSmall table caused 2 logical reads - one for each page of the table. Scanning the dbo.ForwardSmall table however caused 18 logical reads. That is almost ten times more. Each of the 16 records caused an additional page read.

Conclusion

This article showed that SQL Server follows each forwarding record it encounters right away. This not only happens in the case of an RID-Lookup but also in the case of a table scan. These additional reads can present a significant performance impact. If you have a decently sized table with variable length columns and with regular update activity, add this behavior to the list of reasons why the table really should have a clustered index.

Categories: General, Performance, SQL Server Internals

Leave a Reply