This is the fifth post in my A Join A Day series about SQL Server Joins. Make sure to let me know how I am doing or ask your burning join related questions by leaving a comment below.
An inner join will only return rows from either row source for which a match in the other row source exists. A left outer join on the other hand will always return all rows of the left row source. The left row source is the one that is placed before the JOIN keyword in the query. The right row source is the one that comes after the JOIN keyword.
Let's assume you need to return all AdressType records from the AdventureWorks2008R2 database and for each one all BusinessEntityAddress records of that type. But if there is a type for which no records in BusinessEntityAddress exist, you still want to see a row in the result set. In a case like this you can use a left outer join such as the following:
This query produces the following result:
The BusinessEntityAddress table contains 19614 records. Each of those addresses is of a specific type. However, only addresses of three different types are currently in the table. The AddressType Table on the other hand has six records. When running the query above, the three used types will be joined to the 19614 addresses producing 19614 rows. The remaining three AddressType records will be returned as well in separate rows resulting in 19617 total rows.
For rows of the left side (AddressType) that do not have a match in the right side (BusinessEntityAddress) the final result set will have the right side columns valued NULL. You can see an example of that in the first row of the result in above image.
To clearly point out the exact difference between an inner join and a left outer join I am going to use the table pair I introduced on day two in the post about the inner join.
This time we are going to execute the following left outer join query:
The result is shown below:
Rows for which matches on both sides exists are treated exactly like they would be treated in an inner join query. The only difference is that rows from the left side that do not have a match on the right side are returned as well. In that case the right side columns are all valued NULL. The row for TblA_Val = 1 shows an example of this.
The OUTER keyword in above example query is optional. Typing LEFT JOIN will yied the same results.
As with the inner join, the ON clause defines the condition which records from the two sides need to fulfill to be considered a match. However, in the case of a LEFT OUTER JOIN it is very important that the entire condition is specified in the ON clause. If parts of it end up in the WHERE clause instead, you will get unexpected results.
Take for example products and reviews. Let's write a query that returns all products and with them the reviewer and the review date of all five star reviews. We would write a query like this:
AdventureWorks clearly needs to work on their product quality. Only two of the 504 product have a five star review:
If we were to move the part that specified that the rating need to be a five star rating into the WHERE clause the result changes dramatically:
The reason is that we changed the question we asked with the query. The first query represents a question like this: "Give me a list of all products. With each product return also the reviewers name and the review date of all their five star reviews."
The second query on the other hand represents a question more like this: "Create a list of all products together with information about all their reviews. Of that list return only the records that contain a five star review."
So, if a product does not have a five star review, it will be completely filtered out by the where clause of the second query. This effectively turns the left outer join back into an inner join. So it is very important to include the entire join condition in the ON clause.
Because it is so important in outer join cases, it is a good practice to always specify the entire match condition after the ON clause even in cases like a standard inner join where it is technically not necessary. That makes reading and understanding the code easier and prevents hard to discover bugs later if for example an inner join has to be changed to an outer join because of a new business requirement.
As with the inner join, there are three algorithms for SQL Server to choose from when a left outer join is requested. SQL Server makes the decision of which one to use based on statistics and row count estimates. The image below shows the Nested Loops (Left Outer Join) operator that was used for the query in the example section above.
The left outer join allows for retrieval of all records of the left row source. If there are matching rows coming from the right row source, a single row for each match will be returned. All rows from the left side that do not have a match on the right side will be returned as anyway; their right side values will be set to NULL.
This post is part of my December 2012 "A Join A Day" blog post series. You can find the table of contents with all posts published so far in the introductory post: A Join A Day – Introduction. Check back there frequently throughout the month.
You must be logged in to post a comment.
Pingback: A Join A Day – New Year, New Join - sqlity.net | sqlity.net