This is the twenty-ninth post in my A Join A Day series about SQL Server Joins. Make sure to let me know how I am doing or ask your burning join related questions by leaving a comment below.
Up to now all articles in this series looked at queries written to retrieve data with the SELECT statement. Today and tomorrow we are going to look at ways to change data while using information in more than one table.
Let's assume our order system contained a bug and messed up some orders. The orders are easily identifiable as they have a Status value of 8 on the Sales.SalesOrderHeader table. For all items in Sales.SalesOrderDetail that are part of those orders we need to reduce the quantity by a factor of 10. If we want to write a SELECT statement to see those orders and their correct quantities we would use this join statement:
<br /> SELECT sod.SalesOrderID,<br /> OrderQty = sod.OrderQty * 0.1<br /> FROM Sales.SalesOrderHeader AS soh<br /> JOIN Sales.SalesOrderDetail AS sod<br /> ON soh.SalesOrderID = sod.SalesOrderID<br /> WHERE soh.Status = 8;<br />
For each order with Status = 8 it calculates and returns the tenth part of the quantity of each item.
To turn this into an update statement we just need to switch out the SELECT line like this:
<br /> UPDATE sod SET<br /> OrderQty = sod.OrderQty * 0.1<br /> FROM Sales.SalesOrderHeader AS soh<br /> JOIN Sales.SalesOrderDetail AS sod<br /> ON soh.SalesOrderID = sod.SalesOrderID<br /> WHERE soh.Status = 8;<br />
This will now update the rows in the Sales.SalesOrderDetail that have a status of 8 in the Sales.SalesOrderHeader table. The join itself is exactly the same as in the SELECT query. The UPDATE references the sod table alias that is defined for the Sales.SalesOrderDetail table, to indicate that we want to update records in that table.
After running that statement we notice that the Status = 8 records are not actually of wrong quantity. Instead they were invented by the system and now need to be removed. For that we can again use the same query, we just need to replace the first line with a DELETE like this:
<br /> DELETE sod<br /> FROM Sales.SalesOrderHeader AS soh<br /> JOIN Sales.SalesOrderDetail AS sod<br /> ON soh.SalesOrderID = sod.SalesOrderID<br /> WHERE soh.Status = 8;<br />
This takes care of the Sales.SalesOrderDetail records for those "invented" orders. To also remove the Sales.SalesOrderHeader you would have to execute a separate DELETE just against that table, but we are going to skip that step today.
Again, the actual join part of the query did not change. Like the UPDATE, the DELETE is referencing the sod alias defined in the join to indicate which table to delete from.
Both UPDATE and DELETE allow you to specify a FROM clause. That FROM clause can be followed by almost anything that you can put behind the FROM keyword in a SELECT statement. Each table referenced by this FROM clause should have its own alias. The table to actually update needs to be referenced directly after the UPDATE keyword using its alias, not the table name itself. This is not the only way to reference multiple tables in an update statement, however this way tends to be the one that is the easiest to read and that is the easiest to get right.
You might have noticed that the second line of the SELECT statement is using an uncommon syntax:OrderQty = sod.OrderQty * 0.1. This is an alternative to specifying the column alias name after the value using the AS keyword. I like to use it when working on a SELECT that might be used in an UPDATE query, as it allows me to just switch out the first line of the SELECT statement. This requires that the target column name is used as alias for each value.
Using SQL Server, all UPDATE or DELETE statements can only change data in one table. If you need to update rows in more than one table, you need to write a separate statement for each of them. Other database management systems like MySQL allow for multiple tables to be changed within one statement. If you have to port code like that to T-SQL you need to separate those changes out.
The above SELECT query has this execution plan:
The same execution plan is used to identify the rows to update in the UPDATE query, feeding rows to the actual update logic:
Depending on the columns updated, the UPDATE execution plan might contain additional operators to enforce constraints (as the Assert operator in this plan) or to prevent the same row from being updated more than once. That might cause the select portion of the plan to change. For example another join algorithm might be used in some cases. In general however you will see the same plan being part of the UPDATE query's execution plan.
The same is true for the DELETE query:
It also contains the SELECT query's execution plan (minus the calculation of the new quantity value) as the part to identify the rows that need to be removed.
The DELETE execution plan in this example ended up being an "index at a time" execution plan. Both UPDATE and DELETE queries can be either "row at a time" or "index at a time". In the "row at a time" mode the main update/delete operator takes care of all indexes for each row at the time the row is changed. In the "index at a time" mode however, the main operator only changes the base table. The indexes are then updated on at a time afterwards.
The T-SQL language allows us to use the same powerful join syntax that the SELECT statement offers to identify the rows that need to be changed in UPDATE as well as DELETE statements. While those queries can lead to totally different execution plans, often we see the same execution plan that the matching SELECT statement produced being part of the modification execution plan as well.
This post is part of my December 2012 "A Join A Day" blog post series. You can find the table of contents with all posts published so far in the introductory post: A Join A Day – Introduction. Check back there frequently throughout the month.