A Join A Day – New Year, New Join

2012-12-31 - A Join A Day, Fundamentals, General, Series, T-SQL Statements

Introduction

This is the thirty-first and last post in my A Join A Day series about SQL Server Joins. Make sure to let me know how I am doing or ask your burning join related questions by leaving a comment below.

Once upon a time, pre 1992 to be exact, there was chaos in Join land. The SQL standard committee had not thought outside the inner join yet, so there was no way to write what we now know as an Outer Join, yet – at least no standardized way. The different companies providing database management systems (DBMSs) all were aware of this shortfall and some came up with their own solutions that were oriented on the standardized inner join syntax, while others missed the boat.

Microsoft introduced the star-join. Not the star-join we know today but the join that used the *= syntax to provide an outer join experience. Oracle instead came up with the (+) syntax to achieve the same. If you were using IBM's DB2 you had to hand-code an outer join by union-ing an inner join with a not exist.

Then 1992 rolled along and with it came the ANSI-92 SQL Standard. The different DBMSs all started to adapt, more or less quickly, the new join syntax. Now, twenty years later I am not aware of any DBMS out there that does not provide the "new" outer join syntax.

However, in my work as a database consultant I still see the old style used quite often, despite Microsoft having set the bar very high by requiring you to leave your database in compatibility level 80 (SQL Server 2000) to be able to use it.

If you are still using this syntax, it is time to change. I would like you to set yourself a new-year's resolution: "From now on I will exclusively use the ANSI-92 outer join syntax".

I was debating for long if I even should write about the old syntax as the last thing I want to do is to encourage its use, not even accidentally. But if you are using a post SQL Server 2000 version SQL Server will reject the attempt to use it with this lengthy error message:

[sourcecode] Msg 4147, Level 15, State 1, Line 3
The query uses non-ANSI outer join operators ("*=" or "=*").
To run this query without modification, please set the compatibility level for current database
to 80 or lower, using stored procedure sp_dbcmptlevel.
It is strongly recommended to rewrite the query using ANSI outer join operators
(LEFT OUTER JOIN, RIGHT OUTER JOIN). In the future versions of SQL Server,
non-ANSI join operators will not be supported even in backward-compatibility modes.
[/sourcecode]

Therefore I think it is safe to assume that an accidental encouragement will not happen.

I hope this brief refresher on how to rewrite an old style join into a new outer join can help a few people out there stuck with an old database that urgently needs to be transformed.

I am not going to reiterate all the problems that come with the old style syntax that range from un-readability over real ambiguities to non-expressible queries. For the purpose of this article let's just assume that we all agree that the old style syntax is just plain bad and needs to be replaced as quickly as possible.

*= JOIN Example

The old style join syntax did not know the concept of an on clause. Instead the condition of the join was placed in the WHERE clause together with other non-join conditions. All tables where first mentioned in a comma separated list after the FROM keyword followed by the WHERE clause specifying how they all fit together.

To be able to demonstrate the old syntax I had to create a new database and set it to compatibility level 80. I aptly named it [COMPATIBILITY_LEVEL 80]:

[sql] CREATE DATABASE [COMPATIBILITY_LEVEL 80];
ALTER DATABASE [COMPATIBILITY_LEVEL 80] SET COMPATIBILITY_LEVEL=80;
USE [COMPATIBILITY_LEVEL 80];

SELECT *
INTO dbo.Customer
FROM AdventureWorks2008R2.Sales.Customer;

SELECT BusinessEntityID,
PersonType,
NameStyle,
Title,
FirstName,
MiddleName,
LastName,
Suffix,
EmailPromotion,
rowguid,
ModifiedDate
INTO dbo.Person
FROM AdventureWorks2008R2.Person.Person;
[/sql]

An inner join between the two tables would have looked like this:

[sql] SELECT *
FROM dbo.Customer AS C, dbo.Person AS P
WHERE C.PersonID = P.BusinessEntityID;
[/sql]

The comma separated list of tables is actually still supported as of SQL Server 2012. It is an alternative syntax for the CROSS JOIN and should be used only in those cases in which the requirement is really a cross join. As soon as additional relationships between the tables get expressed in the where clause you are really dealing with an INNER JOIN and as such should use the same. However, even in a real cross join situation I prefer to use the explicit CROSS JOIN syntax, That makes it quite obvious for the reader what is going on here and that the missing join condition that the optimizer will complain about was not just an oversight. (There is one exception to this rule: When my query uses all tables just as row sources without referencing any of their columns, I tend to use the shorter table list syntax. There is an example of this hidden in the earlier posts in this series.)

If we want to turn above query into a left outer join, we need to indicate that we would like to see all customers, even the ones that are not persons. In SQL Server that was done by adding a * to the side of the = in the where clause that mentions the column of the dbo.Customer table:

deprecated left outer join

Oracle used a different indicator. In PL/SQL you had to add a (+) in the WHERE clause after the column of the "optional" table (Oracle Docs). So the above left join, returning all customers independent of whether they are persons or not, would look like this:

[sourcecode] SELECT *
FROM Customer AS C, Person AS P
WHERE C.PersonID = P.BusinessEntityID (+)
[/sourcecode]

Both syntax forms have in common that they make the identification of left or right quite hard. The following query for example has the * on the right side of the = but is still a left outer join because the dbo.Customer table is mentioned first in the table list:

[sql] SELECT *
FROM dbo.Customer AS C, dbo.Person AS P
WHERE P.BusinessEntityID =* C.PersonID;
[/sql]

When rewriting old style joins, you also need to pay attention to the fact that the use of the *= turns every reference to the two tables into an outer reference. That means that these two queries are in fact not equivalent:

[sql] SELECT *
FROM dbo.Customer AS C, dbo.Person AS P
WHERE C.PersonID *= P.BusinessEntityID
AND C.ModifiedDate = P.ModifiedDate;

SELECT *
FROM dbo.Customer AS C
LEFT JOIN dbo.Person AS P
ON C.PersonID = P.BusinessEntityID
WHERE C.ModifiedDate = P.ModifiedDate;
[/sql]

The first returns all customers but no person information; the second one does not return a single row at all. See the article A Join A Day – The Left Outer Join for a more in-depth explanation of the difference. (PL/SQL requires the (+) to be mentioned in all references to the optional table, so if you translate the first query above into PL/SQL it will actually behave like the second query.)

Summary

This article gave a brief refresher on the pre-ANSI-92 outer join syntax that was phased out more than 20 years ago. The examples of queries using the old style outer joins in T-SQL as well as PL/SQL show how they can be converted into the "new" outer join syntax.

A Join A Day

This post is part of my December 2012 "A Join A Day" blog post series. You can find the table of contents with all posts published so far in the introductory post: A Join A Day – Introduction. Check back there frequently throughout the month.

Categories: A Join A Day, Fundamentals, General, Series, T-SQL Statements

Leave a Reply