T-SQL Tuesday #24 – Prox ‘n’ Funx

2011-11-08 - General, T-SQL Tuesday

T-SQL Tuesday

It is T-SQL Tuesday again, number 24 about procedures and functions, hosted by Brad Schulz (Blog|no twitter).

Performance Comparisons of different types of Functions

A lot has been written about the performanmce impact of functions.
Amongst other effects, functions used in a WHERE clause will prevent the use of an index seek for the columns involved.

Sometimes however it makes sense to use functions to encapsulate a complicated calculation or to make code reuse possible.
It is therefore important that to understand just how big the impact is, that functions have on queries.

I would like to use this blog post to look at the call overhead of different kinds of functions in SQL Server.

SQL Server knows three types of functions: Scalar Functions, Inline Table-Valued Functions and Multi Statement Table-Valued Functions.
There are also Scalar and Table-Valued CLR Functions, but we will not be looking at those in this article.

To be able to look just at the overhead of the function call we will use a very simple calculation: 1 * value

All examples used in this post will select from a simple table:

[sql] CREATE TABLE dbo.tbl1
(
id INT CONSTRAINT PK_tbl1 PRIMARY KEY CLUSTERED,
data INT NOT NULL
);
[/sql]

This table has been initialized with 1000000 rows containing random values in the data column.

As a base line we will use the following SELECT statement:

[sql] SELECT * INTO #x FROM dbo.tbl1 WHERE (1*data) = 42;
[/sql]

Note the use of "(1*data)" instead of just "data". This is the calculation we will implement
in the different functions and the use of the calculation in the baseline query has the same
impact on index usage as a function call has: It effectively prevents a seek operation on
any existing index.
Also note the "INTO #x". It prevents that the results have to be send back to the client
eliminating the chance for any ASYNCH_NETWORK_IO waits to affect the outcome of this performance test.
Instead the results (if any) will be written to tempdb on the same machine.

Now lets look at the three functions:

[sql] CREATE FUNCTION dbo.simpleSVF(@parm INT)
RETURNS INT
AS
BEGIN
RETURN 1*@parm;
END;
GO
CREATE FUNCTION dbo.simpleTVF(@parm INT)
RETURNS TABLE
AS
RETURN SELECT 1*@parm val;
GO
CREATE FUNCTION dbo.multiStmtTVF(@parm INT)
RETURNS @r TABLE(val INT)
AS
BEGIN
INSERT INTO @r(val) VALUES(1*@parm);
RETURN;
END
[/sql]

They all implement the same calculation we saw above.
The Scalar Function (simpleSVF) just returns the result of that calculation directly,
while the two Table-Valued functions return a resultset with one column and one row,
containing the calculation result.

To use the Scalar Function we can just replace the calculation with a call to the function.
The use of a Table-Values Function in this context requires a little more work.
There are two patterns that you can use: A correlated subquery or a CROSS APPLY.
We will compare both of them:

[sql] SELECT * INTO #x FROM dbo.tbl1 WHERE (SELECT val FROM dbo.simpleTVF(data)) = 42;
SELECT * INTO #x FROM dbo.tbl1 t CROSS APPLY dbo.simpleTVF(t.data) f WHERE f.val = 42;
[/sql]

The same two call patterns will also be used for the Multi Statement Table-Valued function.

To record the execution times I created a small stored procedure that uses the information
in sys.dm_exec_requests together with the SYSDATETIME() function for it's measurements:

[sql] CREATE PROCEDURE dbo.RunTest
@cid INT,
@cmd NVARCHAR(MAX)
AS
BEGIN
DECLARE @duration1 DATETIME2 ;
DECLARE @duration2 DATETIME2 ;
DECLARE @cpu1 INT ;
DECLARE @cpu2 INT ;
DECLARE @reads1 INT ;
DECLARE @reads2 INT ;

DBCC DROPCLEANBUFFERS ;
DBCC FREEPROCCACHE ;
SELECT @duration1 = SYSDATETIME() ,
@cpu1 = cpu_time ,
@reads1 = logical_reads
FROM sys.dm_exec_requests
WHERE session_id = @@SPID ;

EXEC(@cmd) ;

SELECT @duration2 = SYSDATETIME() ,
@cpu2 = cpu_time ,
@reads2 = logical_reads
FROM sys.dm_exec_requests
WHERE session_id = @@SPID ;

INSERT INTO dbo.TestResults(cid,StartDTime,Duration,CPU,LogicalReads,Cmd)
SELECT @cid,
@duration1,
DATEDIFF(microsecond, @duration1, @duration2),
@cpu2 - @cpu1,
@reads2 - @reads1,
@cmd;
END;
[/sql]

This procedure executes the passed in command and records the execution time in microseconds,
the time spend working on any CPU in milliseconds(!) as well as the logical reads in pages.
It also clears out the buffer cache as well as the procedure cache each time before
executing the statement. (So don't run this in production!)

The first parameter passed in is the command id (cid) and it is not used by the procedure itself,
it is just stored together with the results. The 6 different calls to this procedure are listed below:

[sql] EXEC dbo.RunTest 1,'SELECT * INTO #x FROM dbo.tbl1 WHERE (1*data) = 42;';
EXEC dbo.RunTest 2,'SELECT * INTO #x FROM dbo.tbl1 WHERE dbo.simpleSVF(data) = 42;';
EXEC dbo.RunTest 3,'SELECT * INTO #x FROM dbo.tbl1 WHERE (SELECT val FROM dbo.simpleTVF(data)) = 42;';
EXEC dbo.RunTest 4,'SELECT * INTO #x FROM dbo.tbl1 t CROSS APPLY dbo.simpleTVF(t.data) f WHERE f.val = 42;';
EXEC dbo.RunTest 5,'SELECT * INTO #x FROM dbo.tbl1 WHERE (SELECT val FROM dbo.multiStmtTVF(data)) = 42;';
EXEC dbo.RunTest 6,'SELECT * INTO #x FROM dbo.tbl1 t CROSS APPLY dbo.multiStmtTVF(t.data) f WHERE f.val = 42;';
[/sql]

After execution the first four 100 times each
and the last two 10 times each on my system I got the following numbers (average per execution):

ExecCount Duration CPU LogicalReads Cmd
100 725,781 219 2,296 SELECT * INTO #x FROM dbo.tbl1 WHERE (1*data) = 42;
100 3,007,272 2,702 2,400 SELECT * INTO #x FROM dbo.tbl1 WHERE dbo.simpleSVF(data) = 42;
100 708,660 209 2,283 SELECT * INTO #x FROM dbo.tbl1 WHERE (SELECT val FROM dbo.simpleTVF(data)) = 42;
100 715,560 211 2,290 SELECT * INTO #x FROM dbo.tbl1 t CROSS APPLY dbo.simpleTVF(t.data) f WHERE f.val = 42
10 112,347,425 109,412 33,984,827 SELECT * INTO #x FROM dbo.tbl1 WHERE (SELECT val FROM dbo.multiStmtTVF(data)) = 42;
10 109,809,480 107,776 33,984,881 SELECT * INTO #x FROM dbo.tbl1 t CROSS APPLY dbo.multiStmtTVF(t.data) f WHERE f.val = 42

As you can see, there is almost no difference between the baseline and the two
usages of the Inline Table-Valued Function.
The Scalar Function on the other hand took more than four times as long. Most of that time
(2.7 of 3 seconds) was spend actually using the processor. There were also about 100 more pages read.

Now lets look at the Multi Statement Table-Valued Function.
I had to stop its execution after just 10 rounds to get the results in before Tuesday.
This function slows the query by a factor of over 150.

This is caused by the fact that a Multi Statement Table-Valued Function uses a
table variable to collect the rows.
To create a table variable - similar to any other table - at least two pages
need to be reserved as well as several changes to system tables
and internal database pages need to be performed.
That all needs to be undone once the function finished executing.

In both queries above, the function gets executed for every row — 1,000,000 times in this case. This is causing all that table setup and destroy work to be executed 1,000,000 times as well.

Conclusion

If you find yourself in a situation that lends itself to the use of a function,
try to go with an Inlined Table-Valued Function as there is almost no performance impact.
However, always remeber that this blog post did not look into the impact that a function has on index usage.

Unless you are writing a function that is going to be executed only rarely, stay away
from Multi Statement Table-Valued Functions.

Categories: General, T-SQL Tuesday

One Response to T-SQL Tuesday #24 – Prox ‘n’ Funx

  1. Pingback: A Join A Day – The Cross Apply - sqlity.net | sqlity.net

Leave a Reply