The Hidden SQL Server Gem: UPDATE from SELECT

2015-08-19 - Fundamentals, General, Queries, T-SQL Statements

3 Simple Ways to use UPDATE from SELECT making Your Job easier

You probably have been in this situation before: You needed to update the data in one table using information stored in another table. I often meet people who have not heard of the powerful UPDATE from SELECT solution SQL Server provides for this problem. In fact, I struggled with this problem for quite some time before I found out about this gem.

In the following lines, I am going to show you three tricks that have made my life simpler on many occasions. For that, we first need two tables:

The Customers Table

The Orders Table

If you want to follow along, you can get the script here: The Hidden SQL Server Gem - UPDATE from SELECT.sql

1)Develop your UPDATE from a SELECT

Now that we have the environment set up, let's dive into how to make this work. Before I show you the multi-table solution, let me demonstrate the simplest form of the UPDATE FROM syntax and show you a simple trick to make developing your UPDATE statements really simple, by writing a SELECT statement first and then turning it into an update by deleting two characters. Intrigued?

The dbo.Orders table contains and is_archived column. This is used to archive orders older than 90 days. However, we have the additional rule that orders that have not yet been paid or shipped cannot be archived. (In this example, the is_archived column has no physical effect. See my SQL Server Pro article Using Table Partitions to Archive Old Data in OLTP Environments if you are interested in how to turn the is_archived column into an actual archive switch.)

A SELECT statement to return the archive-able records would look like this:

[sql] SELECT *
FROM dbo.Orders AS O
WHERE O.order_date < DATEADD(DAY,-90,SYSDATETIME())
AND O.is_paid = 1
AND O.is_shipped = 1;
[/sql]

Now let's add a little magic to it:

[sql] --UPDATE O SET /*
SELECT *, -- */
is_archived = 1
FROM dbo.Orders AS O
WHERE O.order_date < DATEADD(DAY,-90,SYSDATETIME())
AND O.is_paid = 1
AND O.is_shipped = 1;
[/sql]

This is still a simple SELECT statement. We just added two comments and an additional column. That additional column is using the uncommon alias syntax name = value with is equivalent to the more common value AS name syntax.

The beauty of this lies in the fact that I can turn this select into a syntactically correct UPDATE statement, just by removing the two dashes in front of the UPDATE keyword:

[sql] UPDATE O SET /*
SELECT *, -- */
is_archived = 1
FROM dbo.Orders AS O
WHERE O.order_date < DATEADD(DAY,-90,SYSDATETIME())
AND O.is_paid = 1
AND O.is_shipped = 1;
[/sql]

This updates the dbo.Orders table just like and UPDATE dbo.Orders SET… statement would, because the dbo.Orders table is aliased as O and the UPDATE is referencing that same O alias.

2)Adding a JOIN to the UPDATE statement

The question that let us here was how we can use data in a one table to update another table. Wouldn't it be nice if we could just "JOIN"? Good news: The above syntax allows us to do just that.

Recently an order_count column was added to the dbo.Customers table. Our job now is to value that column correctly based on the actual orders each customer has placed. We start out again by writing a select first:

[sql] --UPDATE C SET /*
SELECT *, -- */
order_count = OA.cnt
FROM dbo.Customers AS C
JOIN(
SELECT O.customer_id,
COUNT(1) cnt
FROM dbo.Orders AS O
GROUP BY O.customer_id
)OA
ON C.customer_id = OA.customer_id;
[/sql]

Once the SELECT statement returns the correct results, it is easy to switch it to UPDATE:

[sql] UPDATE C SET /*
SELECT *, -- */
order_count = OA.cnt
FROM dbo.Customers AS C
JOIN(
SELECT O.customer_id,
COUNT(1) cnt
FROM dbo.Orders AS O
GROUP BY O.customer_id
)OA
ON C.customer_id = OA.customer_id;
[/sql]

Because of the use of the alias C, SQL Server knows to update the dbo.Customers table while pulling in necessary information from other table(s) referenced in the statement.

3)Using UPDATE with a CTE

If CTEs are your thing, you can even go a step further with this. As long as SQL Server can easily determine what you are intending to update, you can actually "UPDATE" a CTE directly using a very similar syntax:

[sql] WITH order_counts AS
(
SELECT O.customer_id,
COUNT(1) cnt
FROM dbo.Orders AS O
GROUP BY O.customer_id
),
customer_order_counts AS
(
SELECT
C.customer_id,
C.name,
C.order_count,
OC.cnt new_order_cnt
FROM dbo.Customers AS C
JOIN order_counts AS OC
ON C.customer_id = OC.customer_id
)
UPDATE COC SET /*
SELECT *, -- */
order_count = COC.new_order_cnt
FROM customer_order_counts AS COC;
[/sql]

He only thing you cannot do with any of the above statements, is to update data in two tables at the same time.

!)A Word of Caution

This syntax provides us with a very powerful way to write UPDATE statements that require data from more than one table. However, be careful not to write code that shows random behavior. Take a look at this UPDATE statement:

Fragile UPDATE FROM SELECT

Just like the highlighted SELECT returns a customer with multiple orders multiple times, the UPDATE would happily update each customer multiple times, each time overwriting the prior change. Only the last change would persist.

The problem with that is that there is no guarantee in which order that happens. The order is dependent on the execution plan chosen and can change any time. So while the above statement might actually result in the last order date being written in your tests, it likely will execute in a different order once the statement encounters a large amount of data. This will lead to hard to debug problems, so pay attention to this possibility, when writing your UPDATE FROM SELECT statements.

Categories: Fundamentals, General, Queries, T-SQL Statements
Tags: , ,

One Response to The Hidden SQL Server Gem: UPDATE from SELECT

Leave a Reply