When thinking about the indexes for update statements it is easy to overlook an important peculiarity in the way SQL Server finds the rows to update before updating them.
If you have an UPDATE statement that filters on two columns and you have an index on one of them, you might be tempted to think that the necessary lookup for the second column is not going to cost a lot - particular when that second column is not selective at all - as SQL Server has to access the storage place of each row anyway when executing the actual update. This however is not true, as SQL Server executes the update in two separate phases: One to identify the rows and one to update those rows. That means a lookup in the find phase can be very expensive.
Let us look at an example. Use the following script to create two identical tables and insert 200,000 rows into each of them.
</p> <p>CREATE TABLE dbo.tst1(<br /> Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,<br /> Val INT,<br /> Filter INT,<br /> IdxKey INT,<br /> Fill CHAR(1000) DEFAULT 'Fill'<br /> );</p> <p>CREATE TABLE dbo.tst2(<br /> Id INT IDENTITY(1,1) PRIMARY KEY CLUSTERED,<br /> Val INT,<br /> Filter INT,<br /> IdxKey INT,<br /> Fill CHAR(1000) DEFAULT 'Fill'<br /> );</p> <p>INSERT INTO dbo.tst1(Val,Filter,IdxKey)<br /> OUTPUT INSERTED.Val,INSERTED.Filter,INSERTED.IdxKey INTO dbo.tst2(Val,Filter,IdxKey)<br /> SELECT 0,0, n%1000<br /> FROM dbo.GetNums(200000);<br />
The Filter column will be used to force the lookup later on by adding "Filter = 0" to the where clause. It is valued 0 in all rows, so no row will be excluded by this filter.
After creating the tables, create the following indexes:
<br /> CREATE INDEX dbo_tst1_IdxKey ON dbo.tst1(IdxKey);<br /> CREATE INDEX dbo_tst2_IdxKey ON dbo.tst2(IdxKey) INCLUDE (Filter);<br />
Both indexes are on the IdxKey column, but the index on the dbo.tst2 table also includes the Filter column.
Now run the following two identical update statements against the two tables:
<br /> UPDATE dbo.tst1<br /> SET Val = 17<br /> WHERE IdxKey <30 AND Filter = 0;</p> <p>UPDATE dbo.tst2<br /> SET Val = 17<br /> WHERE IdxKey <30 AND Filter = 0;<br />
The first one requires a lookup operator as the Filter column is not included in the index on dbo.tst1. Below are the two execution plans:
As you can clearly see, the update operator is separated from the data retrieval operator(s) by several other operators. It looks like all the filtering and value preparations happen before the update operator which is then just updating all rows passed to it without any further filtering.
I ran the above statements in a loop, each executing 2000 times and recorded time and logical reads. The averaged results are below:
|98||175||37212||2000||UPDATE dbo.tst1 SET Val = 17 WHERE IdxKey <30 AND Filter = 0;|
|47||72||19211||2000||UPDATE dbo.tst2 SET Val = 17 WHERE IdxKey <30 AND Filter = 0;|
The cpu time, the elapsed time and the logical reads were each about twice as high in the query against dbo.tst1, the one requiring the lookup operator. This was the case even though the lookup did not actually filter any rows - every row accessed by the lookup had to be updated as well.
The above statistics prove what the execution plan already suggested: The finding of rows for an update and the update itself are separate steps in SQL Server update queries. That means the "find" portion has to be optimized as if it were a standalone SELECT. In particular, lookup operations can not be ignored as SQL Server accesses the rows twice in this case: Once for the lookup and once for the update.