A Join A Day – Nested Joins

2012-12-19 - A Join A Day, Fundamentals, General, Series

Introduction

This is the nineteenth post in my A Join A Day series about SQL Server Joins. Make sure to let me know how I am doing or ask your burning join related questions by leaving a comment below.

Most of the examples so far have dealt with joins between two tables. In this post we are going to take a look at queries that involve more than two tables.

Nested Join Example

Each join in a query takes two inputs, so if a query contains more than two tables, there needs to be an additional join for each additional table, using one of its inputs for the new table and one to link to the existing tables:

[sql] SELECT pers.FirstName,
pers.LastName,
cust.AccountNumber,
soh.OrderDate,
sod.OrderQty,
sod.LineTotal,
prod.Name,
prod.ListPrice
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod
ON soh.SalesOrderID = sod.SalesOrderID
JOIN Production.Product AS prod
ON sod.ProductID = prod.ProductID
JOIN Sales.Customer AS cust
ON soh.CustomerID = cust.CustomerID
JOIN Person.Person AS pers
ON cust.PersonID = pers.BusinessEntityID
ORDER BY pers.BusinessEntityID, soh.SalesOrderID, sod.SalesOrderDetailID;
[/sql]

In this example we have five tables joined together. After the first one each additional one has its own JOIN section and its own join condition.

Nested Join Syntax

There are several ways to write a query with several joins. There is the chained approach in which each additional table is just added at the end. This is the syntax used most often.

Another option is to group tables together in sub-queries like this:

[sql] SELECT cp.FirstName,
cp.LastName,
cp.AccountNumber,
so.OrderDate,
so.OrderQty,
so.LineTotal,
prod.Name,
prod.ListPrice
FROM (
SELECT soh.OrderDate,
sod.OrderQty,
sod.LineTotal,
sod.ProductID,
soh.CustomerID,
soh.SalesOrderID,
sod.SalesOrderDetailID
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod
ON soh.SalesOrderID = sod.SalesOrderID
) AS so
JOIN Production.Product AS prod
ON so.ProductID = prod.ProductID
JOIN (
SELECT cust.CustomerID,
cust.AccountNumber,
pers.FirstName,
pers.LastName,
pers.BusinessEntityID
FROM Sales.Customer AS cust
JOIN Person.Person AS pers
ON cust.PersonID = pers.BusinessEntityID
) AS cp
ON so.CustomerID = cp.CustomerID
ORDER BY cp.BusinessEntityID, so.SalesOrderID, so.SalesOrderDetailID;
[/sql]

You can even go a step further with this and move each group into its own CTE:

[sql] WITH cp AS (
SELECT cust.CustomerID,
cust.AccountNumber,
pers.FirstName,
pers.LastName,
pers.BusinessEntityID
FROM Sales.Customer AS cust
JOIN Person.Person AS pers
ON cust.PersonID = pers.BusinessEntityID
),
so AS (
SELECT soh.OrderDate,
sod.OrderQty,
sod.LineTotal,
sod.ProductID,
soh.CustomerID,
soh.SalesOrderID,
sod.SalesOrderDetailID
FROM Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod
ON soh.SalesOrderID = sod.SalesOrderID
)
SELECT cp.FirstName,
cp.LastName,
cp.AccountNumber,
so.OrderDate,
so.OrderQty,
so.LineTotal,
prod.Name,
prod.ListPrice
FROM so
JOIN Production.Product AS prod
ON so.ProductID = prod.ProductID
JOIN cp
ON so.CustomerID = cp.CustomerID
ORDER BY cp.BusinessEntityID, so.SalesOrderID, so.SalesOrderDetailID;
[/sql]

The third and probably least well known option allows you to directly break the joins into several groups:

[sql] SELECT pers.FirstName,
pers.LastName,
cust.AccountNumber,
soh.OrderDate,
sod.OrderQty,
sod.LineTotal,
prod.Name,
prod.ListPrice
FROM Production.Product AS prod
JOIN (
Sales.SalesOrderHeader AS soh
JOIN Sales.SalesOrderDetail AS sod
ON soh.SalesOrderID = sod.SalesOrderID
)
ON sod.ProductID = prod.ProductID
JOIN (
Sales.Customer AS cust
JOIN Person.Person AS pers
ON cust.PersonID = pers.BusinessEntityID
)
ON soh.CustomerID = cust.CustomerID
ORDER BY pers.BusinessEntityID, soh.SalesOrderID, sod.SalesOrderDetailID;
[/sql]

In this last example the parentheses are optional, but removing them makes the query harder to read and understand.

Of the four examples above, the last one is probably the easiest to comprehend, however this syntax style can backfire too.
Check out this T-SQL Tuesday post for a fairly simple example that shows how you should not use this syntax style (and how to make it better): http://mickeystuewe.com/2012/12/11/t-sql-tuesday-37-right-join-left-join-raw-raw-raw/

All the above queries are equivalent. The order of joins in a query does not change the result. In fact, the SQL Server optimizer often reorders the tables when building an execution plan. In this example all four queries end up using the same plan. So the only reason to use the one syntax over the other is readability for us humans. Keep that in mind when writing your joins and make sure that your improvement is not ending up being a Verschlimmbesserung.

Nested Join Operators

Each join operator that SQL Server can use to build an execution plan also can only handle two row sources. However, there are many different options to chain these operators together. The execution plan for the example queries above looks like this:

a multi-join execution plan

Each join operator acts as first input to the next operator. The second input is always a single table.
There are a few additional operators like a Sort or a Compute Scalar in there that you can ignore for this discussion.

The above execution plan is in the form of a left deep tree:

a left-deep join tree

SQL Server could in theory create an execution plan where the joins are more evenly distributed. Such an execution plan would have the shape of a bushy tree:

a bushy join tree

However, SQL Server tries to avoid those kinds of plans. While there are queries where a bushy tree would be more efficient, bushy execution plans can have problems like a significantly increased requirement for memory. Therefore you will rarely see them when working with SQL Server.

To demonstrate that they exist, I modified the example query above with a join hint (tomorrow's topic):

a bushy hash join execution plan

Even though SQL Server avoids bushy plans, it does not only use left deep trees either. You will also see right deep trees or even alternating trees equally often. In most cases however at least one input to each join operator is coming directly from a table and not from another join.

Summary

The T-SQL language allows us to join multiple tables together in a single query. After the first table, each additional one requires its own join statement and its own join condition. There are many ways to write a multi table join and group related tables together. However all result in the same logical query.

On the physical side, each join is represented by a single join operator. Each operator takes exactly two inputs. As changing the order of the tables in a multi join does not change the result, the optimizer will often use a different physical order in the execution plan than the order in which the tables are mentioned in the query.

In most execution plans each join operator will have at least one input sub-tree that does not contain another join operator. However, it is possible to get a bushy join tree too.

A Join A Day

This post is part of my December 2012 "A Join A Day" blog post series. You can find the table of contents with all posts published so far in the introductory post: A Join A Day – Introduction. Check back there frequently throughout the month.

Categories: A Join A Day, Fundamentals, General, Series

4 Responses to A Join A Day – Nested Joins

Leave a Reply