This is the thirtieth post in my A Join A Day series about SQL Server Joins. Make sure to let me know how I am doing or ask your burning join related questions by leaving a comment below.
SQL Server 2008 introduced the MERGE command. It allows to write an insert, update and delete in a single statement. In this article we are going to look at how this statement works and what MERGE-ing has to do with joining.
Let's look at a typical reporting scenario. To speed up reporting we have a de-normalized version of the customer data stored in the dbo.CustomerInfo table. The following code creates that table and inserts some data into it.
IF OBJECT_ID('dbo.CustomerInfo') IS NOT NULL DROP TABLE dbo.CustomerInfo; CREATE TABLE [dbo].[CustomerInfo] ( [CustomerID] [int] PRIMARY KEY CLUSTERED, [AccountNumber] [varchar](10) NOT NULL, [PersonType] [nchar](2) NOT NULL, [NameStyle] [dbo].[NameStyle] NOT NULL, [Title] [nvarchar](8) NULL, [FirstName] [dbo].[Name] NULL, [MiddleName] [dbo].[Name] NULL, [LastName] [dbo].[Name] NOT NULL, [Suffix] [nvarchar](10) NULL, [Store] [dbo].[Name] NOT NULL, [Teritory] [dbo].[Name] NOT NULL, [CountryRegionCode] [nvarchar](3) NOT NULL, [SalesYTD] [money] NOT NULL, [SalesLastYear] [money] NOT NULL, [ModifiedDate] [datetime] DEFAULT GETDATE() ); GO INSERT INTO dbo.CustomerInfo ( CustomerID, AccountNumber, PersonType, NameStyle, Title, MiddleName, LastName, Suffix, Store, Teritory, CountryRegionCode, SalesYTD, SalesLastYear ) SELECT CASE WHEN c.CustomerID % 10 = 0 THEN c.CustomerID - 29000 ELSE c.CustomerID END, c.AccountNumber, p.PersonType, p.NameStyle, p.Title, p.MiddleName, p.LastName, p.Suffix, s.Name AS Store, st.Name AS Teritory, st.CountryRegionCode, st.SalesYTD, st.SalesLastYear FROM Sales.Customer AS c JOIN Person.Person AS p ON c.PersonID = p.BusinessEntityID JOIN Sales.Store AS s ON c.StoreID = s.BusinessEntityID JOIN Sales.SalesTerritory AS st ON c.TerritoryID = st.TerritoryID; SELECT * FROM dbo.CustomerInfo;
The customer information is retrieved joining several tables together.
The task now is to write a query that can bring the contents of this table up to date each time when executed. This query can then be scheduled to be executed at regular intervals.
The MERGE statement does provide this functionality in one simple construct:
MERGE dbo.CustomerInfo AS ci USING Sales.Customer AS c JOIN Person.Person AS p ON c.PersonID = p.BusinessEntityID JOIN Sales.Store AS s ON c.StoreID = s.BusinessEntityID JOIN Sales.SalesTerritory AS st ON c.TerritoryID = st.TerritoryID ON ci.CustomerID = c.CustomerID WHEN MATCHED THEN UPDATE SET FirstName = p.FirstName, ModifiedDate = DEFAULT WHEN NOT MATCHED THEN INSERT ( CustomerID, AccountNumber, PersonType, NameStyle, Title, FirstName, MiddleName, LastName, Suffix, Store, Teritory, CountryRegionCode, SalesYTD, SalesLastYear ) VALUES ( c.CustomerID, c.AccountNumber, p.PersonType, p.NameStyle, p.Title, p.FirstName, p.MiddleName, p.LastName, p.Suffix, s.Name, st.Name, st.CountryRegionCode, st.SalesYTD, st.SalesLastYear ) WHEN NOT MATCHED BY SOURCE THEN DELETE OUTPUT $action,ISNULL(INSERTED.CustomerId, DELETED.CustomerId);
Now you probably are asking yourself right now what my breakfast was laced with getting me to call this monster statement "simple", but once you break it apart into its pieces it looks a lot like separate update, insert and delete statements and with that it becomes a lot easier to follow.
The OUTPUT clause in the example shows a new pseudo column: $action. It was introduced together with MERGE and it is valued either "insert", "update" or "delete" dependent on which action affected the current row. With this output we get a record of what happened to each row. You could include the other columns used in the MERGE statement too, if you want to create a complete change data log.
The above create table script had deliberately a few defects. Some of the records where inserted with a significantly to small CustomerId value and all of them did not have the first name valued. Executing the above MERGE statement automatically deletes the rows in the target table that do not exist in the source table, it inserts new rows and updates all existing rows. So, after it's execution we expect the data defects to be all gone.
The output of the above MERGE statement looks like this:
You can see in the output that some rows were deleted, some were inserted and some updated. Take a look at the dbo.CustomerInfo yourself to confirm that it is now completely in sync with the source data.
The MERGE statement has several sections. The first section specifies the tables involved. The target table is mentioned right after the MERGE keyword. After the USING keyword the source tables are specified. The final ON clause tells SQL Server how to determine if rows are a match or not.
The next three sections are the insert, update and delete sections. They each get identified by a WHEN clause. WHEN MATCHED starts the section were an update statement can be specified. WHEN NOT MATCHED allows to specify an insert statement for new rows. WHEN NOT MATCHED BY SOURCE finally provides a way to delete orphaned rows.
The syntax of all three sections is very close to the stand alone INSERT, UPDATE and DELETE statements, you just leave out the target table name. So, UPDATE dbo.table SET becomes UPDATE SET. Instead of INSERT dbo.table(column, list) you write INSERT (column, list). The DELETE finally gets reduced to just the single keyword.
All three sections are optional. However, at least one needs to be part of every MERGE construct, as it otherwise would just be a NOOP. All three operations can also be independently further restricted by supplying an additional condition after the WHEN clause linked with the AND keyword.
The last section is the OUTPUT clause that works like the OUTPUT clause in any INSERT, UPDATE or DELETE statement.
The complete description of the MERGE statement you can find here: http://msdn.microsoft.com/en-us/library/bb510625(v=sql.100).aspx
When looking at the execution plan for a MERGE statement two things are of interest. To demonstrate them let's first look at the execution plan for the insert statement above that we used to put the initial set of data into out dbo.CustomerInfo table:
As always, you can click on the image to get a larger version.
The execution plan for the MERGE statement looks like this:
The execution plan sub-tree that is responsible to retrieve the data from the four tables can be found pretty much unchanged in the execution plan for the MERGE statement as well. However there is now an additional join. That additional join shows up as a Full Outer Join operator that joins the result of above sub-tree with the target table (dbo.CustomerInfo).
That Full Outer Join operator is actually what makes the identification of matching or missing rows possible. The Merge operator uses the information coming from this join to decide which operation to execute. The Merge operator is actually a single new operator that looks similar the known Insert, Update or Delete operators. It just is able to execute all three operations at once.
Be aware that MERGE is not able to handle more than one row on the source side matching a single target side row. The same is true for the other direction too. Because of that MERGE is best suited to handle comparison of tables on a primary key column.
The MERGE statement gives us a powerful way to encapsulate the complex logic necessary to sync one table with another. Under the cover the Merge operator can execute inserts, updates and deletes against the target table. Which action to take it decides base on the result of a full outer join of the target table and the source table(s).
This post is part of my December 2012 "A Join A Day" blog post series. You can find the table of contents with all posts published so far in the introductory post: A Join A Day – Introduction. Check back there frequently throughout the month.