A Join A Day – The Left Outer Join

2012-12-05 - A Join A Day, General, Series, T-SQL Statements

Introduction

This is the fifth post in my A Join A Day series about SQL Server Joins. Make sure to let me know how I am doing or ask your burning join related questions by leaving a comment below.

An inner join will only return rows from either row source for which a match in the other row source exists. A left outer join on the other hand will always return all rows of the left row source. The left row source is the one that is placed before the JOIN keyword in the query. The right row source is the one that comes after the JOIN keyword.

LEFT OUTER JOIN Example

Let's assume you need to return all AdressType records from the AdventureWorks2008R2 database and for each one all BusinessEntityAddress records of that type. But if there is a type for which no records in BusinessEntityAddress exist, you still want to see a row in the result set. In a case like this you can use a left outer join such as the following:

[sql] SELECT at.Name AS AddressTypeName, bea.BusinessEntityID, bea.AddressID
FROM Person.AddressType at
LEFT JOIN Person.BusinessEntityAddress bea
ON at.AddressTypeID = bea.AddressTypeID
ORDER BY at.AddressTypeID, bea.BusinessEntityID;
[/sql]

This query produces the following result:

left outer join example

The BusinessEntityAddress table contains 19614 records. Each of those addresses is of a specific type. However, only addresses of three different types are currently in the table. The AddressType Table on the other hand has six records. When running the query above, the three used types will be joined to the 19614 addresses producing 19614 rows. The remaining three AddressType records will be returned as well in separate rows resulting in 19617 total rows.

For rows of the left side (AddressType) that do not have a match in the right side (BusinessEntityAddress) the final result set will have the right side columns valued NULL. You can see an example of that in the first row of the result in above image.

To clearly point out the exact difference between an inner join and a left outer join I am going to use the table pair I introduced on day two in the post about the inner join.

setup for all-cases join example

This time we are going to execute the following left outer join query:

[sql] SELECT *
FROM dbo.TblA AS A
LEFT OUTER JOIN dbo.TblB AS B
ON A.TblA_Val = B.TblB_Val;
[/sql]

The result is shown below:

all-cases example for left outer join

Rows for which matches on both sides exists are treated exactly like they would be treated in an inner join query. The only difference is that rows from the left side that do not have a match on the right side are returned as well. In that case the right side columns are all valued NULL. The row for TblA_Val = 1 shows an example of this.

LEFT OUTER JOIN Syntax

The OUTER keyword in above example query is optional. Typing LEFT JOIN will yied the same results.

As with the inner join, the ON clause defines the condition which records from the two sides need to fulfill to be considered a match. However, in the case of a LEFT OUTER JOIN it is very important that the entire condition is specified in the ON clause. If parts of it end up in the WHERE clause instead, you will get unexpected results.

Take for example products and reviews. Let's write a query that returns all products and with them the reviewer and the review date of all five star reviews. We would write a query like this:

[sql] SELECT p.Name,
p.ProductNumber,
p.ListPrice,
p.SellStartDate,
pr.ReviewerName,
pr.ReviewDate
FROM Production.Product AS p
LEFT OUTER JOIN Production.ProductReview AS pr
ON p.ProductID = pr.ProductID
AND pr.Rating = 5
ORDER BY pr.ReviewDate DESC, p.SellStartDate DESC;
[/sql]

AdventureWorks clearly needs to work on their product quality. Only two of the 504 product have a five star review:

List of products with their five star review date

If we were to move the part that specified that the rating need to be a five star rating into the WHERE clause the result changes dramatically:

List of five star rated products

The reason is that we changed the question we asked with the query. The first query represents a question like this: "Give me a list of all products. With each product return also the reviewers name and the review date of all their five star reviews."

The second query on the other hand represents a question more like this: "Create a list of all products together with information about all their reviews. Of that list return only the records that contain a five star review."

So, if a product does not have a five star review, it will be completely filtered out by the where clause of the second query. This effectively turns the left outer join back into an inner join. So it is very important to include the entire join condition in the ON clause.

Because it is so important in outer join cases, it is a good practice to always specify the entire match condition after the ON clause even in cases like a standard inner join where it is technically not necessary. That makes reading and understanding the code easier and prevents hard to discover bugs later if for example an inner join has to be changed to an outer join because of a new business requirement.

LEFT OUTER JOIN Operator

As with the inner join, there are three algorithms for SQL Server to choose from when a left outer join is requested. SQL Server makes the decision of which one to use based on statistics and row count estimates. The image below shows the Nested Loops (Left Outer Join) operator that was used for the query in the example section above.

Left Outer Join Execution Plan

Summary

The left outer join allows for retrieval of all records of the left row source. If there are matching rows coming from the right row source, a single row for each match will be returned. All rows from the left side that do not have a match on the right side will be returned as anyway; their right side values will be set to NULL.

A Join A Day

This post is part of my December 2012 "A Join A Day" blog post series. You can find the table of contents with all posts published so far in the introductory post: A Join A Day – Introduction. Check back there frequently throughout the month.

Categories: A Join A Day, General, Series, T-SQL Statements

One Response to A Join A Day – The Left Outer Join

  1. Pingback: A Join A Day – New Year, New Join - sqlity.net | sqlity.net

Leave a Reply