This is the twentieth post in my A Join A Day series about SQL Server Joins. Make sure to let me know how I am doing or ask your burning join related questions by leaving a comment below.
Before SQL Server can execute a query it needs to compile it into an execution plan. The compilation is done by the optimizer. The optimizer is looking at possible execution plans for the query and is trying to identify the best one. Yesterday we learned that the order of tables in a join does not change the result. Reordering the tables in a join construct can already lead to very many plans. There are six possible orders when three tables are involved, 24 with four tables and 120 with five tables. There are also many other options that the optimizer has to choose from, for example the three join algorithms. There are 81 possibilities to select one of the three algorithms for four different join operators. So, in the case of a five table join there are 120 * 81 = 9720 different plans to consider. This does not even yet include things like index selection. You can see, the number of possible plans quickly gets too big for the optimizer to look at all of them.
Because of the high number of possible plans, the optimizer does not actually try to find the best plan. Instead it tries to find a good enough plan. To determine if it found a good enough plan, the optimizer looks at the cost estimate of the plans found so far and based on those estimates determines how long it is going to spend to find a better plan. After that time it takes the best plan it found up to then.
This is a very simplistic view of the things that happen during optimization. The point I am trying to make is that is it indeed very common for the optimizer to not find the best plan. However, most of the time the plan it comes up with is pretty close to the best. But every once in a while the execution plan that the optimizer came up with is really bad. In a case like that we can use query hints to help the optimizer find a better plan.
Let's look at this simple query:
SELECT soh.AccountNumber, soh.OrderDate, sod.OrderQty, sod.UnitPrice FROM Sales.SalesOrderDetail AS sod INNER JOIN Sales.SalesOrderHeader AS soh ON soh.SalesOrderID = sod.SalesOrderID;
For this query the optimizer decides to use a Merge Join operator:
The Merge Join algorithm is actually the best choice in this context. But let's assume for this article that we know that another algorithm is better and we want to influence the optimizer's decision.
There are two ways to sway the optimizer to use a different join algorithm. The first one is a direct join hint and the second on is a query join hint. Both have in common, that they are not really hints, but they rather force the optimizer to use the specified algorithm.
To specify a direct join hint, we just need to mention the desired algorithm in between the word INNER and the word JOIN like this:
The key word LOOP causes the optimizer to us the Loop Join algorithm. If you want to specify the algorithm you have to type the INNER key word as well. Just writing TblA LOOP JOIN TblB will result in an error. However, when using an outer join, the hint can be placed right before the key word JOIN and the key word OUTER stays optional:
This forces the Hash Join algorithm to be used in this Left Outer Join context.
The third algorithm is the Merge Join that the query used without a hint. You can force this algorithm with the key word MERGE:
So we would expect this to be the same execution plan as the one we got without the hint. But if you look closely, that is not the case. The order of the two inputs has been switched. The hint-free execution plan uses the Sales.SalesOrderHeader table as first input and the Sales.SalesOrderDetail table as second input. Because Sales.SalesOrderHeader has less rows, that is the preferred order for almost all join situations. However, when hinting to the optimizer, which algorithm we want to use, we also force the order of the tables to stay the same as specified in the query.
A hint specified like this affects only the join operator that it was specified at. You can specify a different algorithm for each join in the query. However, even a single join hint forces the order of the entire query.
The second way to specify the desired join algorithm is a query hint. A query join hint is specified at the end of the query like this:
This syntax also allows you to specify any of the three algorithms LOOP, HASH and MERGE. There are two important differences between a query hint and a join hint. The first on is that a query hint does not force the table order as you can see in above example. The second difference is that the query hint forces all joins to use the same algorithm. Let's look at this query for an example:
SELECT soh.AccountNumber, soh.OrderDate, sod.OrderQty, sod.UnitPrice, prod.Name FROM Sales.SalesOrderDetail AS sod INNER JOIN Sales.SalesOrderHeader AS soh ON soh.SalesOrderID = sod.SalesOrderID INNER JOIN Production.Product AS prod ON sod.ProductID = prod.ProductID;
If un-hinted like above, it results in this execution plan:
Now let's specify just a single join hint:
SELECT soh.AccountNumber, soh.OrderDate, sod.OrderQty, sod.UnitPrice, prod.Name FROM Sales.SalesOrderDetail AS sod INNER JOIN Sales.SalesOrderHeader AS soh ON soh.SalesOrderID = sod.SalesOrderID INNER LOOP JOIN Production.Product AS prod ON sod.ProductID = prod.ProductID;
This forces only the second join to be a loop join. The first join is now replaced by a hash join.
You can also see that the order in which the tables are accessed now matches the order in which they are mentioned in the query.
Now let's look at this query with a query hint:
SELECT soh.AccountNumber, soh.OrderDate, sod.OrderQty, sod.UnitPrice, prod.Name FROM Sales.SalesOrderDetail AS sod INNER JOIN Sales.SalesOrderHeader AS soh ON soh.SalesOrderID = sod.SalesOrderID INNER JOIN Production.Product AS prod ON sod.ProductID = prod.ProductID OPTION(LOOP JOIN);
Now both join operators are using the Loop Join algorithm:
However, the table order does not match the one specified in the query.
The two types of hints cannot be mixed. If you attempt to do so error 1042 will be raised:
If you just want to force the order of the tables without specifying the algorithm(s) to be used, there is a
n app for that too:
SELECT soh.AccountNumber, soh.OrderDate, sod.OrderQty, sod.UnitPrice, prod.Name FROM Sales.SalesOrderDetail AS sod INNER JOIN Sales.SalesOrderHeader AS soh ON soh.SalesOrderID = sod.SalesOrderID INNER JOIN Production.Product AS prod ON sod.ProductID = prod.ProductID OPTION(FORCE ORDER);
The FORCE ORDER query hint gives us just that functionality. It forces the order but still allows the optimizer to choose the algorithms that it finds most appropriate.
FORCE ORDER actually does not only consider the order of tables in the query, but also the placement of the ON clauses. For an example let's look at this slightly modified query:
SELECT soh.AccountNumber, soh.OrderDate, sod.OrderQty, sod.UnitPrice,pers.FirstName, pers.LastName FROM ( Sales.SalesOrderHeader AS soh INNER JOIN Sales.SalesOrderDetail AS sod ON soh.SalesOrderID = sod.SalesOrderID ) INNER JOIN ( Sales.Customer AS cust INNER JOIN Person.Person AS pers ON cust.PersonID = pers.BusinessEntityID ) ON soh.CustomerID = cust.CustomerID;
Without a join or table hint we get this right deep execution plan:
However, if we add the FORCE ORDER hint the optimizer builds a bushy execution plan in which first the Sales.SalesOrderHeader and Sales.SalesOrderDetail tables are joined, then the Sales.Customer and Person.Person tables and finally the two results with each outher:
There are a few additional options available for join and query hints that we can't discuss today. Check out these two Books Online articles for more information:
Working with hints in T-SQL is always a double edged sword. While you might find a better table order or set of algorithms for a particular query base on the current data, you are taking away SQL Servers ability to adapt to changes in the data. Every time enough data in one of the tables has changed SQL Server (with default options enabled) will revisit every query accessing that table to see if it can come up with a better plan. It cannot do that with queries on which you have forced its way. For that reason it is an accepted best practice to use any type of query hint only after all other options failed.
If you have a join query that needs some performance improvements, the first thing to check is if appropriate indexes exist and if the query is written in a way that those indexes can actually be used. If that does not help, make sure that all statistics are up to date. Finally you could check if adding additional statistics or filtered statistics can improve the query.
This list is not meant to be exhaustive. It just gives you a starting point for things to look at when trying to improve the performance of a join query.
Join hints are a very powerful way to steer the decisions of the optimizer that affect join algorithm selection and table access order. We have seen several options to influence those decisions. However, every time we hint to the optimizer, we take some of its flexibility to adapt to changes in the data away. While a hinted query might be smooth sailing today, be aware of what comes after the next wave of updates to your data.
This post is part of my December 2012 "A Join A Day" blog post series. You can find the table of contents with all posts published so far in the introductory post: A Join A Day – Introduction. Check back there frequently throughout the month.