A Join A Day – Loop Join Limitations

2012-12-26 - A Join A Day, General, Series, SQL Server Internals

Introduction

This is the twenty-sixth post in my A Join A Day series about SQL Server Joins. Make sure to let me know how I am doing or ask your burning join related questions by leaving a comment below.

Over the next three days I am going to look at what is possible and what is not possible with each join algorithm. This information can help to understand why the optimizer might have made a particular choice. I am going to start out with the Loop Join algorithm.

Loop Join

As we have seen before, there are nine logical join types. With the exception of the cross join, each one can be executed either as equi-join or as nonequi-join. The following is a list of example queries for each combination together with their execution plans. All nonequi-join queries are plain nonequi-joins without any equality columns. Because mixed joins are handled just like equi-joins with an additional filter step, there are no mixed examples included here.

Cross Loop Join

cross-loop-join

Inner Loop Equi-Join

inner-loop-equi-join

Left Loop Equi-Join

left-loop-equi-join

Right Loop Equi-Join

right-loop-equi-join

Full Loop Equi-Join

full-loop-equi-join

Left Semi Loop Equi-Join

left-semi-loop-equi-join

Right Semi Loop Equi-Join

N/A

Left Anti Semi Loop Equi-Join

left-anti-semi-loop-equi-join

Right Anti Semi Loop Equi-Join

N/A

Inner Loop Nonequi-Join

inner-loop-nonequi-join

Left Loop Nonequi-Join

left-loop-nonequi-join

Right Loop Nonequi-Join

right-loop-nonequi-join

Full Loop Nonequi-Join

full-loop-nonequi-join

Left Semi Loop Nonequi-Join

left-semi-loop-nonequi-join

Right Semi Loop Nonequi-Join

N/A

Left Anti Semi Loop Nonequi-Join

left-anti-semi-loop-nonequi-join

Right Anti Semi Loop Nonequi-Join

N/A

Observations

The loop join is the universal join algorithm. It can handle almost any situation including the join-condition-free cross join. It even does not care if the query is an equi- or a nonequi-join. However, it cannot handle any right sided logical join. Because the algorithm always goes through the left side one row at a time and then tries to find matches on the right side for each of them, it cannot keep track of which right side rows have been used and which ones haven't. Therefore any forced right side loop join request will fail as you can see in the examples above. However, there is no statement or hint that forces a semi join and particularly there is no way to write a "right semi" query. The only way to get to one of those is if the optimizer decides that a rearrangement of the tables will improve performance. Because the Loop Join algorithm can't handle any right side join, any query for which SQL Server decides to use a semi join operator will end up using a left semi operator. The same conclusion can be drawn for the anti semi joins. Because of that I did not included examples for the four different right-(anti-)semi conditions above.

For the same reason that prevents a right outer loop join, a full outer loop join is also not possible. However, there are situations (as we will see over the next two days) that can only be handled by the Loop Join operator, so the optimizer has to come up with a workaround for this problem. If faced with a query that requires a loop join and also requires a full join, the optimizer rewrites the query from Tbl1 FULL LOOP JOIN Tbl2 to Tbl1 LEFT LOOP JOIN Tbl2 UNION ALL Tbl2 WHERE NOT EXISTS(Tbl1). This rewrite leads to a very expensive query plan potentially causing Tbl1 to be scanned once for each row in Tbl2 and additionally Tbl2 to be scanned once for each row in Tbl1. Let's look at a quick example:

[sql] SET STATISTICS IO ON;
GO
SELECT *
FROM dbo.Tbl100 A
FULL LOOP JOIN dbo.Tbl10 B
ON A.Val = - B.Val;
GO
SET STATISTICS IO OFF;
[/sql]

Remember, both tables have as many pages as they have rows:

create example tables

The captured statistics look like this:

[sourcecode] Table 'Tbl100'. Scan count 2, logical reads 1100, physical reads 0, read-ahead reads 94, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Tbl10'. Scan count 2, logical reads 1010, physical reads 2, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
[/sourcecode]

Tbl100 shows 1100 logical reads or 11 times its size and Tbl10 shows 1010 logical reads or 101 times its size. So, if you run into a full outer join that has to be handled by a loop join, it is even more important that both tables are adequately indexed.

Finally, all nonequi queries contain an additional Spool operator. The Spool operator creates a copy of the data, including only columns that are actually needed, in tempdb. This is an attempt to make the inevitable scan less pain-full. Because these queries do not have an equality condition, any index would be only of marginal help. So, if you have a lot of nonequi-join queries make sure your tempdb is as finely tuned as possible. Things to check for first in this context are the number of tempdb files and placement of such on a separate disk that is as fast as possible.

A Join A Day

This post is part of my December 2012 "A Join A Day" blog post series. You can find the table of contents with all posts published so far in the introductory post: A Join A Day – Introduction. Check back there frequently throughout the month.

Categories: A Join A Day, General, Series, SQL Server Internals

Leave a Reply