Home » Blog » A Join A Day – Intersect

A Join A Day – Intersect

2012-12-16 - A Join A Day, Fundamentals, General, Series, T-SQL Statements

Introduction

This is the sixteenth post in my A Join A Day series about SQL Server Joins. Make sure to let me know how I am doing or ask your burning join related questions by leaving a comment below.

INTERSECT is a T-SQL keyword that was introduced in SQL Server 2005. It does not look like a join at first glance. However it is included in this series because it is closely related to joins.

INTERSECT comes directly from the mathematical set theory (pdf). So let me give you a short refresher on that first.

INTERSECT Example

Assume I have two sets of objects: {A, B, M, T, Y, Z} and {G, K, L, M, O, Y}

If I intersect those two sets, the result is a new set: {M, Z}

The new set contains exactly those objects that are element in both sets.

So an INTERSECTion between two row sources returns all rows from the left side that are also on the right. That sounds very similar to what an EXISTS() query is doing. But there is one big difference. Let's look at a TblA – TblB example. First, as always the tables are shown below:

The query looks like this:

[sql] SELECT TblA_Val FROM dbo.TblA
INTERSECT
SELECT TblB_Val FROM dbo.TblB
[/sql]

When executing this query you get a result that contains 4 rows:

Let's recall the EXISTS() example from the Left Semi Join post, modified to also only return the TblA_Val column:

[sql] SELECT TblA_Val FROM dbo.TblA AS a
WHERE EXISTS(SELECT 1 FROM dbo.TblB AS b WHERE a.TblA_Val = b.TblB_Val)
[/sql]

That query produces the following result:

In this result the rows 3 and 4 seem duplicated. However, if you look at the table content, you see that there are two separate rows each for TblA_Val = 3 and TblA_Val = 4. So they are not duplicated in the EXISTS() result but rather they are missing in the INTERSECT result. The reason is, that INTERSECT was implemented following mathematical set theory:

Any object can either be an element of a given set or not be an element of that set. In any set an object can never be contained more than once. The output of the three SQL statements UNION, INTERSECT and EXCEPT is always a proper set, so SQL Server removes any duplicate rows (elements). We will look at EXCEPT in tomorrow's post.

If the left side does not contain any duplicates for example because you are including the primary key of the left side table, INTERSECT and EXISTS() queries will return the same result.

INTERSECT Syntax

The big advantage of the INTERSECT statement is that the resulting queries are a lot easier to read. Particularly if you are matching on multiple columns, an equivalent EXISTS() query quickly becomes rather hard to read. However, other than an EXISTS() query all columns that are part of the result need to exist (and match) on both sides.

The syntax is similar to the UNION statement. You can connect any two queries with the INTERSECT statement. The only restriction is that the column count and data types must match. However, implicit conversions are possible.

The full INTERSECT syntax you can find here: http://msdn.microsoft.com/en-us/library/ms188055(v=sql.105).aspx

INTERSECT Operator

Because INTERSECT is so similar to EXISTS() its use usually results in a Left Semi Join operator:

Notice however the additional Stream Aggregate operator. It has the sole purpose of removing the duplicate rows.

Summary

INTERSECT is a T-SQL statement that was introduced in SQL Server 2005. It returns the intersection of two query results the same way an intersection of two sets in mathematical set theory work. The result is a set of rows, which means that duplicate rows are automatically removed.

A Join A Day

This post is part of my December 2012 "A Join A Day" blog post series. You can find the table of contents with all posts published so far in the introductory post: A Join A Day – Introduction. Check back there frequently throughout the month.