This is the twenty-sixth post in my A Join A Day series about SQL Server Joins. Make sure to let me know how I am doing or ask your burning join related questions by leaving a comment below.
Over the next three days I am going to look at what is possible and what is not possible with each join algorithm. This information can help to understand why the optimizer might have made a particular choice. I am going to start out with the Loop Join algorithm.
As we have seen before, there are nine logical join types. With the exception of the cross join, each one can be executed either as equi-join or as nonequi-join. The following is a list of example queries for each combination together with their execution plans. All nonequi-join queries are plain nonequi-joins without any equality columns. Because mixed joins are handled just like equi-joins with an additional filter step, there are no mixed examples included here.
The loop join is the universal join algorithm. It can handle almost any situation including the join-condition-free cross join. It even does not care if the query is an equi- or a nonequi-join. However, it cannot handle any right sided logical join. Because the algorithm always goes through the left side one row at a time and then tries to find matches on the right side for each of them, it cannot keep track of which right side rows have been used and which ones haven't. Therefore any forced right side loop join request will fail as you can see in the examples above. However, there is no statement or hint that forces a semi join and particularly there is no way to write a "right semi" query. The only way to get to one of those is if the optimizer decides that a rearrangement of the tables will improve performance. Because the Loop Join algorithm can't handle any right side join, any query for which SQL Server decides to use a semi join operator will end up using a left semi operator. The same conclusion can be drawn for the anti semi joins. Because of that I did not included examples for the four different right-(anti-)semi conditions above.
For the same reason that prevents a right outer loop join, a full outer loop join is also not possible. However, there are situations (as we will see over the next two days) that can only be handled by the Loop Join operator, so the optimizer has to come up with a workaround for this problem. If faced with a query that requires a loop join and also requires a full join, the optimizer rewrites the query from Tbl1 FULL LOOP JOIN Tbl2 to Tbl1 LEFT LOOP JOIN Tbl2 UNION ALL Tbl2 WHERE NOT EXISTS(Tbl1). This rewrite leads to a very expensive query plan potentially causing Tbl1 to be scanned once for each row in Tbl2 and additionally Tbl2 to be scanned once for each row in Tbl1. Let's look at a quick example:
Remember, both tables have as many pages as they have rows:
The captured statistics look like this:
Tbl100 shows 1100 logical reads or 11 times its size and Tbl10 shows 1010 logical reads or 101 times its size. So, if you run into a full outer join that has to be handled by a loop join, it is even more important that both tables are adequately indexed.
Finally, all nonequi queries contain an additional Spool operator. The Spool operator creates a copy of the data, including only columns that are actually needed, in tempdb. This is an attempt to make the inevitable scan less pain-full. Because these queries do not have an equality condition, any index would be only of marginal help. So, if you have a lot of nonequi-join queries make sure your tempdb is as finely tuned as possible. Things to check for first in this context are the number of tempdb files and placement of such on a separate disk that is as fast as possible.
This post is part of my December 2012 "A Join A Day" blog post series. You can find the table of contents with all posts published so far in the introductory post: A Join A Day – Introduction. Check back there frequently throughout the month.