A Join A Day – Nested Joins

2012-12-19 - A Join A Day, Fundamentals, General, Series

Introduction

This is the nineteenth post in my A Join A Day series about SQL Server Joins. Make sure to let me know how I am doing or ask your burning join related questions by leaving a comment below.

Most of the examples so far have dealt with joins between two tables. In this post we are going to take a look at queries that involve more than two tables.

Nested Join Example

Each join in a query takes two inputs, so if a query contains more than two tables, there needs to be an additional join for each additional table, using one of its inputs for the new table and one to link to the existing tables:

SELECT  pers.FirstName,
        pers.LastName,
        cust.AccountNumber,
        soh.OrderDate,
        sod.OrderQty,
        sod.LineTotal,
        prod.Name,
        prod.ListPrice
FROM    Sales.SalesOrderHeader AS soh
JOIN    Sales.SalesOrderDetail AS sod
        ON soh.SalesOrderID = sod.SalesOrderID
JOIN    Production.Product AS prod
        ON sod.ProductID = prod.ProductID
JOIN    Sales.Customer AS cust
        ON soh.CustomerID = cust.CustomerID
JOIN    Person.Person AS pers
        ON cust.PersonID = pers.BusinessEntityID
ORDER BY pers.BusinessEntityID, soh.SalesOrderID, sod.SalesOrderDetailID;

In this example we have five tables joined together. After the first one each additional one has its own JOIN section and its own join condition.

Nested Join Syntax

There are several ways to write a query with several joins. There is the chained approach in which each additional table is just added at the end. This is the syntax used most often.

Another option is to group tables together in sub-queries like this:

SELECT  cp.FirstName,
        cp.LastName,
        cp.AccountNumber,
        so.OrderDate,
        so.OrderQty,
        so.LineTotal,
        prod.Name,
        prod.ListPrice
FROM    (
          SELECT  soh.OrderDate,
                  sod.OrderQty,
                  sod.LineTotal,
                  sod.ProductID,
                  soh.CustomerID,
                  soh.SalesOrderID,
                  sod.SalesOrderDetailID
          FROM    Sales.SalesOrderHeader AS soh
          JOIN    Sales.SalesOrderDetail AS sod
                  ON soh.SalesOrderID = sod.SalesOrderID
        ) AS so
JOIN    Production.Product AS prod
        ON so.ProductID = prod.ProductID
JOIN    (
          SELECT  cust.CustomerID,
                  cust.AccountNumber,
                  pers.FirstName,
                  pers.LastName,
                  pers.BusinessEntityID
          FROM    Sales.Customer AS cust
          JOIN    Person.Person AS pers
                  ON cust.PersonID = pers.BusinessEntityID
        ) AS cp
        ON so.CustomerID = cp.CustomerID
ORDER BY cp.BusinessEntityID, so.SalesOrderID, so.SalesOrderDetailID; 

You can even go a step further with this and move each group into its own CTE:

WITH  cp AS (
             SELECT cust.CustomerID,
                    cust.AccountNumber,
                    pers.FirstName,
                    pers.LastName,
                    pers.BusinessEntityID
             FROM   Sales.Customer AS cust
             JOIN   Person.Person AS pers
                    ON cust.PersonID = pers.BusinessEntityID
           ),
      so AS (
             SELECT soh.OrderDate,
                    sod.OrderQty,
                    sod.LineTotal,
                    sod.ProductID,
                    soh.CustomerID,
                    soh.SalesOrderID,
                    sod.SalesOrderDetailID
             FROM   Sales.SalesOrderHeader AS soh
             JOIN   Sales.SalesOrderDetail AS sod
                    ON soh.SalesOrderID = sod.SalesOrderID
           )
  SELECT  cp.FirstName,
          cp.LastName,
          cp.AccountNumber,
          so.OrderDate,
          so.OrderQty,
          so.LineTotal,
          prod.Name,
          prod.ListPrice
  FROM    so
  JOIN    Production.Product AS prod
          ON so.ProductID = prod.ProductID
  JOIN    cp
          ON so.CustomerID = cp.CustomerID
  ORDER BY cp.BusinessEntityID, so.SalesOrderID, so.SalesOrderDetailID;

The third and probably least well known option allows you to directly break the joins into several groups:

SELECT  pers.FirstName,
        pers.LastName,
        cust.AccountNumber,
        soh.OrderDate,
        sod.OrderQty,
        sod.LineTotal,
        prod.Name,
        prod.ListPrice
FROM    Production.Product AS prod
JOIN    (
        Sales.SalesOrderHeader AS soh
        JOIN Sales.SalesOrderDetail AS sod
          ON soh.SalesOrderID = sod.SalesOrderID
        )
        ON sod.ProductID = prod.ProductID
JOIN    (
        Sales.Customer AS cust
        JOIN Person.Person AS pers
          ON cust.PersonID = pers.BusinessEntityID
        )
        ON soh.CustomerID = cust.CustomerID
ORDER BY pers.BusinessEntityID, soh.SalesOrderID, sod.SalesOrderDetailID;

In this last example the parentheses are optional, but removing them makes the query harder to read and understand.

Of the four examples above, the last one is probably the easiest to comprehend, however this syntax style can backfire too.
Check out this T-SQL Tuesday post for a fairly simple example that shows how you should not use this syntax style (and how to make it better): http://mickeystuewe.com/2012/12/11/t-sql-tuesday-37-right-join-left-join-raw-raw-raw/

All the above queries are equivalent. The order of joins in a query does not change the result. In fact, the SQL Server optimizer often reorders the tables when building an execution plan. In this example all four queries end up using the same plan. So the only reason to use the one syntax over the other is readability for us humans. Keep that in mind when writing your joins and make sure that your improvement is not ending up being a Verschlimmbesserung.

Nested Join Operators

Each join operator that SQL Server can use to build an execution plan also can only handle two row sources. However, there are many different options to chain these operators together. The execution plan for the example queries above looks like this:

a multi-join execution plan

Each join operator acts as first input to the next operator. The second input is always a single table.
There are a few additional operators like a Sort or a Compute Scalar in there that you can ignore for this discussion.

The above execution plan is in the form of a left deep tree:

a left-deep join tree

SQL Server could in theory create an execution plan where the joins are more evenly distributed. Such an execution plan would have the shape of a bushy tree:

a bushy join tree

However, SQL Server tries to avoid those kinds of plans. While there are queries where a bushy tree would be more efficient, bushy execution plans can have problems like a significantly increased requirement for memory. Therefore you will rarely see them when working with SQL Server.

To demonstrate that they exist, I modified the example query above with a join hint (tomorrow's topic):

a bushy hash join execution plan

Even though SQL Server avoids bushy plans, it does not only use left deep trees either. You will also see right deep trees or even alternating trees equally often. In most cases however at least one input to each join operator is coming directly from a table and not from another join.

Summary

The T-SQL language allows us to join multiple tables together in a single query. After the first table, each additional one requires its own join statement and its own join condition. There are many ways to write a multi table join and group related tables together. However all result in the same logical query.

On the physical side, each join is represented by a single join operator. Each operator takes exactly two inputs. As changing the order of the tables in a multi join does not change the result, the optimizer will often use a different physical order in the execution plan than the order in which the tables are mentioned in the query.

In most execution plans each join operator will have at least one input sub-tree that does not contain another join operator. However, it is possible to get a bushy join tree too.

A Join A Day

This post is part of my December 2012 "A Join A Day" blog post series. You can find the table of contents with all posts published so far in the introductory post: A Join A Day – Introduction. Check back there frequently throughout the month.

Categories: A Join A Day, Fundamentals, General, Series

5 comments
jlr7890
jlr7890

Very helpful, I'm newer at this and encountered the last example wondering what was going on in there...THANKS!

lknmd
lknmd

Thanks a lot, works like a charm! Well written!