The Gap in the Identity Value Sequence

2012-03-13 - General, SQL Server Internals

With this blog post I am going to dive a little deeper into how SQL Server generates and handles identity values and what the implications for your code are.

Your New Identity

SQL Server manages identity values on a per table basis. Every time a new row gets inserted into a table, the current identity value for that table gets incremented and then used for the new row.
You can see this in action by running the following code:

[sql] IF OBJECT_ID('dbo.TableWithIdentity') IS NOT NULL DROP TABLE dbo.TableWithIdentity;

CREATE TABLE dbo.TableWithIdentity
(
Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
ExecId UNIQUEIDENTIFIER,
RowId INT
);

SELECT IDENT_CURRENT('dbo.TableWithIdentity') AS [IDENT_CURRENT];
INSERT INTO dbo.TableWithIdentity OUTPUT INSERTED.IDENTITYCOL DEFAULT VALUES;
SELECT IDENT_CURRENT('dbo.TableWithIdentity') AS [IDENT_CURRENT];
INSERT INTO dbo.TableWithIdentity OUTPUT INSERTED.IDENTITYCOL DEFAULT VALUES;
SELECT IDENT_CURRENT('dbo.TableWithIdentity') AS [IDENT_CURRENT];
INSERT INTO dbo.TableWithIdentity OUTPUT INSERTED.IDENTITYCOL DEFAULT VALUES;
SELECT IDENT_CURRENT('dbo.TableWithIdentity') AS [IDENT_CURRENT];
[/sql]

The output of this query looks like this:

Output of the IDENT_CURRENT example

For rows two and three you can clearly see that the value got increased first and than used, so IDENT_CURRENT returns the last used identity value for a table. When the table was just created or truncated however, IDENT_CURRENT returns the next value, not the one that was used last. That it actually is the next value you can easily test by setting the start value of the identity column to any number other than one.

Instead of the next value, I would have expected to see a NULL returned to indicate that no identity value had been used so far. However, that is not the case, so you need to be aware of this inconsistency of the IDENT_CURRENT function in you code.

Identity Transactions

Identity values do not participate in any transaction in SQL Server. When a row gets inserted, an identity value gets consumed, even if that insert fails as the following demo shows.

[sql] SELECT IDENT_CURRENT('dbo.TableWithIdentity') AS [IDENT_CURRENT];
GO
INSERT INTO dbo.TableWithIdentity(RowId)VALUES('NotANumber');
GO
SELECT IDENT_CURRENT('dbo.TableWithIdentity') AS [IDENT_CURRENT];
GO
INSERT INTO dbo.TableWithIdentity(RowId)VALUES('NotANumber');
GO
SELECT IDENT_CURRENT('dbo.TableWithIdentity') AS [IDENT_CURRENT];
GO
INSERT INTO dbo.TableWithIdentity(RowId)VALUES('NotANumber');
GO
SELECT IDENT_CURRENT('dbo.TableWithIdentity') AS [IDENT_CURRENT];
[/sql]

The output will look something like this.

IDENT_CURRENT after a failing insert

No new row made it into the table because of the conversion errors, but the identity value got increased anyway. The reason for this is simple: To implement the identity value management transactional, a session would need to take a lock on the identity value itself for the duration of the transaction. That would force all inserts to be executed consecutively and concurrency would be practically non-existent.

There are many reasons for which an insert could fail. Examples are an illegal value as in the above example, a deadlock, a full disk drive and so on. Some of the reasons you can control by cleaning your values prior to using them, others cannot be controlled in the inserting code at all. So you really cannot assume, that you will have a gap free list of identity values in your table.

Sequence Chasms

SQL Server does not even guarantee that the identity values used by a single insert are contiguous. To prove that run the following piece of SQL code in multiple connections at the same time.

[sql] DECLARE @ExecId UNIQUEIDENTIFIER = NEWID();
INSERT INTO dbo.TableWithIdentity(ExecId,RowId)
SELECT @ExecId,n FROM dbo.GetNums(10000);
[/sql]

As most of my examples, this one is also using Itzik's GetNums function that you can get here: Virtual Auxiliary Table of Numbers.

The easiest way to run multiple executions of this query at the same time is to use SQL Query Stress by Adam Machanic as shown below.

SQLQueryStress in action to show gaps in a single identity value run

After SQL Query Stress finishes execution, run the following query.

[sql] SELECT B.Id B_Id ,
A.Id A_Id ,
A.Id - B.Id Gap ,
B.RowId B_RowId ,
A.RowId A_RowId ,
A.ExecId
FROM dbo.TableWithIdentity A
JOIN dbo.TableWithIdentity B ON A.ExecId = B.ExecId
AND A.RowId = B.RowId + 1
WHERE A.Id - B.Id <> 1
ORDER BY A.ExecId , B_Id;
[/sql]

This query uses a self-join on the TableWithIdentity table to return a list of all rows that have a gap between their identity value and the identity value of the preceding row of the same execution. On my system it returned about 20,000 gaps (for 200,000 inserted rows in total) with sizes from 2 to 16,000.

This example has shown that even in single inserts you cannot rely on getting contiguous identity values

Replication

One final way to get gaps in your identity stream that I would like to bring up is replication. In all replication scenarios where data could flow in more that one direction, SQL Server has to manage identity ranges to prevent conflicts (see http://technet.microsoft.com/en-us/library/ms152543.aspx).
For this, SQL Server assigns a fixed interval (per article) of identity values to each participant in the replication setup. If a participant runs out of values, a new range is automatically assigned to that participant. That new range will not be adjacent to the previous one in almost all cases, causing gaps in the list of identity values.

That means that even if you have a system that only ever inserts s single row at a time, has only a single connection, and never runs into an issue like a bug or a hard drive failure you still can end up with gaps in your identity value stream, if someone decides to add replication into the mix.

Conclusion

The way identity values are handled by SQL Server, there is no guarantee that the stream of identity values for a single table is free of gaps. Even if you think you have covered all you bases, you still might end up with gaps later on, if the business suddenly requires replication to be added into the mix. Therefor it is a best practice to never rely on contiguous identity values anywhere in your code.

Categories: General, SQL Server Internals