<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>sqlity.net &#187; T-SQL Tuesday</title>
	<atom:link href="http://sqlity.net/en/topic/general/t-sql-tuesday/feed/" rel="self" type="application/rss+xml" />
	<link>http://sqlity.net/en</link>
	<description>Quality for SQL</description>
	<lastBuildDate>Fri, 18 May 2012 13:05:28 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
<xhtml:meta xmlns:xhtml="http://www.w3.org/1999/xhtml" name="robots" content="noindex" />
		<item>
		<title>LAG-ging Behind [T-SQL Tuesday #029 - Let&#039;s have a SQL Server 2012 party]</title>
		<link>http://sqlity.net/en/879/lag-ging-behind-t-sql-tuesday-029-lets-have-a-sql-server-2012-party/</link>
		<comments>http://sqlity.net/en/879/lag-ging-behind-t-sql-tuesday-029-lets-have-a-sql-server-2012-party/#comments</comments>
		<pubDate>Tue, 10 Apr 2012 14:00:57 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[T-SQL Tuesday]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=879</guid>
		<description><![CDATA[This article introduces and explains the new LAG function in SQL Server. It also shows how to use it for SQL Server wait time analysis.]]></description>
			<content:encoded><![CDATA[<div>
<p>
<a href="http://www.nigelpsammy.com/2012/04/t-sql-tuesday-029-lets-have-sql-server.html">
<img height="132" border="0" hspace="9" width="131" alt="SqlTuesday LAG ging Behind [T SQL Tuesday #029   Lets have a SQL Server 2012 party] " title="T-SQL Tuesday #29" src="http://images.sqlity.net/SqlTuesday.png" />
</a>
</p>
<p>T-SQL Tuesday #29 is hosted by Nigel P Sammy (<a href="http://www.nigelpsammy.com/">blog</a>|<a href="https://twitter.com/#!/NigelSammy">twitter</a>). 
This month's topic is "<a href="http://www.nigelpsammy.com/2012/04/t-sql-tuesday-029-lets-have-sql-server.html">Let's have a SQL Server 2012 party</a>".</p>

<h1>Calculating incrementals with the LAG function</h1>
<h3>Introduction</h3>
<p>
With the title of this post I was not trying to imply that I am late in discovering SQL Server 2012 features. 
Instead I was thinking of one small but very useful new feature that allows combining data from several result set rows into a new row. It is the new LAG function. 
</p>	
<p>
 The LAG function is best explained with an example:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT n
INTO #t1
FROM(VALUES(1),(2),(3),(4))X(n);

SELECT n, 
LAG(n,1)OVER(ORDER BY n) [LAG(n,1)], 
LAG(n,2)OVER(ORDER BY n) [LAG(n,2)]
FROM #t1;
</pre>
</div>
<p>
  This returns the result shown below:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/LAG_Example_1_Result.png">
<img src="http://sqlity.net/en/wp-content/uploads/2012/04/LAG_Example_1_Result.png" alt="LAG Example 1 Result LAG ging Behind [T SQL Tuesday #029   Lets have a SQL Server 2012 party] " title="Example - Result" width="177" height="118" class="aligncenter size-full wp-image-901" />
</a>
</p>
<p>
  The LAG function takes 2 parameters. The first is the name of the column you are looking for. The second parameter specifies how many rows you want to go back.
</p>
<p> 
  Lag falls into the category of the window functions, so you also need to specify the order in which to process the rows. This is done with the <span class="tt">OVER(ORDER BY n)</span> clause.
</p>
<p>
As above result shows, <span class="tt">LAG(n, 1)</span> returns the value the column n had in the previous row. Similarly, <span class="tt">LAG(n, 2)</span> returns the value the n had two rows ago.
</p>
<p>
The first parameter can actually be any expression. It gets evaluated in the context of the actual row that is specified by the second parameter. There is also a third parameter that specifies the default value that is returned when the requested row does not exist. When the actual value in the context of the requested row is NULL, NULL is returned even if a default was specified.
</p>
<h3>Window Spool</h3>
<p>
  In SQL Server versions before SQL 2012, to achieve this behavior you had to do a self-join for each row to go back to. To mimic the above query you have to write something like this: 
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
WITH A AS
(
 SELECT n, ROW_NUMBER() OVER(ORDER BY n) rn
 FROM #t1
)
SELECT A0.n, A1.n [LAG(n,1)], A2.n [LAG(n,2)]
FROM A A0
LEFT JOIN A A1 ON A0.rn = A1.rn + 1
LEFT JOIN A A2 ON A0.rn = A2.rn + 2;
</pre>
</div>
<p>
This does not only look more complex, it also is a lot more work for SQL Server. Below is the execution plan for above JOIN query:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/LAG_Example_1_Execution_Plan_with_Joins.png">
<img src="http://sqlity.net/en/wp-content/uploads/2012/04/LAG_Example_1_Execution_Plan_with_Joins.png" alt="LAG Example 1 Execution Plan with Joins LAG ging Behind [T SQL Tuesday #029   Lets have a SQL Server 2012 party] " title="Example - Execution Plan (with joins)" width="932" height="242" class="aligncenter size-full wp-image-902" />
</a>
</p>
<p>
SQL Server has to scan and sort the table three times. It then joins the three streams together using two Hash Join operators. All of these are fairly expensive operations.
</p>
<p>
The execution plan of the LAG query on the other hand looks like this:
</p>
<p>
<a href="http://sqlity.net/en/wp-content/uploads/2012/04/LAG_Example_1_Execution_Plan_with_Window_Spool_for_LAG.png">
<img src="http://sqlity.net/en/wp-content/uploads/2012/04/LAG_Example_1_Execution_Plan_with_Window_Spool_for_LAG.png" alt="LAG Example 1 Execution Plan with Window Spool for LAG LAG ging Behind [T SQL Tuesday #029   Lets have a SQL Server 2012 party] " title="Example - Execution Plan (with Window Spool Operators)" width="1594" height="77" class="aligncenter size-full wp-image-903" />
</a>
</p>
<p>
In this query SQL Server utilizes the new Window Spool operator. Each step size used in a LAG function requires its own Window Spool. Using <span class="tt">LAG(n,1)</span> and <span class="tt">LAG(m,1)</span> with two expressions n and m in the same query requires only one Window Spool operator. Using two different step sizes as in above example requires two Window Spool operators.
</p>
<p>
The Window Spool operator remembers the value of the expression that was specified in the first parameter of the LAG function for the last few rows &ndash; just enough rows to satisfy the LAG requirement.
</p>
<p>
Remembering a few values is clearly a lot less Work, than scanning and sorting the entire table again.
</p>
<h3>How long did you wait?</h3>
<p>
A great use case for this functionality is the analysis of wait times in SQL Server. The <span class="tt">sys.dm_os_wait_stats</span> DMV provides accumulated statistics of the time SQL Server spend waiting since its last restart,
broken down by wait type.
</p>
<p>
A good way to make sense of this information is to capture the wait stats in regular intervals and calculate the delta for each interval. This is now easy with the LAG function:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
IF OBJECT_ID('dbo.WaitStatLog') IS NOT NULL 
  DROP TABLE dbo.WaitStatLog;

CREATE TABLE dbo.WaitStatLog
  (
    Id INT NOT NULL
           IDENTITY(1, 1)
           CONSTRAINT WaitStatLog_PK PRIMARY KEY CLUSTERED ,
    CaptDTime DATETIME2
      NOT NULL
      CONSTRAINT WaitStatLog_CaptDTime_Dflt DEFAULT SYSDATETIME()
  );

IF OBJECT_ID('dbo.WaitStatLogDtl') IS NOT NULL 
  DROP TABLE dbo.WaitStatLogDtl;

CREATE TABLE dbo.WaitStatLogDtl
  (
    WaitStatLogId INT NOT NULL ,
    wait_type NVARCHAR(60) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL ,
    waiting_tasks_count BIGINT NOT NULL ,
    wait_time_ms BIGINT NOT NULL ,
    max_wait_time_ms BIGINT NOT NULL ,
    signal_wait_time_ms BIGINT NOT NULL ,
    CONSTRAINT WaitStatLogDtl_PK PRIMARY KEY CLUSTERED
      ( WaitStatLogId, wait_type )
  );

IF OBJECT_ID('dbo.CaptureWaitStats') IS NOT NULL 
  DROP PROCEDURE dbo.CaptureWaitStats;
GO
CREATE PROCEDURE dbo.CaptureWaitStats
AS
BEGIN  
  DECLARE @Id TABLE(Id INT);
  INSERT INTO dbo.WaitStatLog OUTPUT(INSERTED.Id) INTO @Id(Id) DEFAULT VALUES;
  INSERT INTO dbo.WaitStatLogDtl
  SELECT (SELECT Id FROM @Id), *
  FROM sys.dm_os_wait_stats;
END
GO
</pre>
</div>
<p>
The above SQL script creates two tables and a procedure. The procedure (<span class="tt">dbo.CaptureWaitStats</span>) captures the current values from the <span class="tt">sys.dm_os_wait_stats</span> DMV into the two tables. You can either call it manually or setup a job to execute it in regular intervals.
</p>
<p>
After a few values have been collected you can use the following query to calculate the incremental values:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
WITH WaitDelta AS
(
SELECT  L.Id ,
        L.CaptDTime ,
        DATEDIFF(millisecond,LAG(L.CaptDTime, 1) OVER ( PARTITION BY D.wait_type ORDER BY L.Id ),L.CaptDTime) Interval,
        D.wait_type ,
        D.max_wait_time_ms ,
        D.waiting_tasks_count - LAG(D.waiting_tasks_count, 1) OVER ( PARTITION BY D.wait_type ORDER BY L.Id ) waiting_tasks_count,
        D.wait_time_ms - LAG(D.wait_time_ms, 1) OVER ( PARTITION BY D.wait_type ORDER BY L.Id ) wait_time_ms,
        D.signal_wait_time_ms - LAG(D.signal_wait_time_ms, 1) OVER ( PARTITION BY D.wait_type ORDER BY L.Id ) signal_wait_time_ms
FROM    dbo.WaitStatLog L
        JOIN dbo.WaitStatLogDtl D ON L.Id = D.WaitStatLogId
),
WaitTop AS
(
SELECT *,ROW_NUMBER() OVER(PARTITION BY Id ORDER BY wait_time_ms DESC) rn
FROM WaitDelta
WHERE waiting_tasks_count &gt; 0
AND wait_type NOT IN (
        'CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE', 'SLEEP_TASK',
        'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH', 'WAITFOR', 'LOGMGR_QUEUE',
        'CHECKPOINT_QUEUE', 'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BROKER_TO_FLUSH',
        'BROKER_TASK_STOP', 'CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT', 'DISPATCHER_QUEUE_SEMAPHORE',
        'FT_IFTS_SCHEDULER_IDLE_WAIT', 'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'BROKER_EVENTHANDLER',
        'TRACEWRITE', 'FT_IFTSHC_MUTEX', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP','SLEEP_BPOOL_FLUSH',
        'CXPACKET','DBMIRROR_EVENTS_QUEUE','DBMIRRORING_CMD')
)
SELECT *
FROM WaitTop
WHERE rn&lt;6
ORDER BY Id DESC;
</pre>
</div>
<p>
The query uses the LAG function to calculate the delta values. The <span class="tt">PARTITION BY wait_type</span> clause makes sure that only values of the same wait type are used to calculate a delta. Because all LAG invocations have the same partitioning and sort order and use the same step size, only a single Window Spool is required by this query.  
</p>
<p>
The query returns only the top five wait types for each interval; this is done using the ROW_NUMBER function. It also filters out a few wait types that happen frequently in a healthy SQL Server installation (see <a href="http://www.sqlskills.com/BLOGS/PAUL/post/Wait-statistics-or-please-tell-me-where-it-hurts.aspx" target="_blank">Wait statistics, or please tell me where it hurts</a> for one source of this exclusion list).
</p>
<h3>Conclusion</h3>
<p>
SQL Server's new LAG function provides an easy way to calculate incrementals. It makes the actual coding simpler and also significantly reduces the amount of work that SQL Server itself has to do, compared to the old JOIN method.
</p>
<p>
This is a great tool for analyzing SQL Server performance data that tends to be accessible only in accumulated values.
</p>
<p>
There is also a new LEAD function available that is basically doing the same thing but allows to access values from rows ahead instead of behind.
</p>
</div>
<div class="bottomcontainerBox" style="">
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fsqlity.net%2Fen%2F879%2Flag-ging-behind-t-sql-tuesday-029-lets-have-a-sql-server-2012-party%2F&amp;layout=button_count&amp;show_faces=false&amp;width=85&amp;action=like&amp;font=verdana&amp;colorscheme=light&amp;height=21" scrolling="no" frameborder="0" allowTransparency="true" style="border:none; overflow:hidden; width:85px; height:21px;"></iframe></div>
			<div style="float:left; width:80px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<g:plusone size="medium" href="http://sqlity.net/en/879/lag-ging-behind-t-sql-tuesday-029-lets-have-a-sql-server-2012-party/"></g:plusone>
			</div>
			<div style="float:left; width:95px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<a href="http://twitter.com/share" class="twitter-share-button" data-url="http://sqlity.net/en/879/lag-ging-behind-t-sql-tuesday-029-lets-have-a-sql-server-2012-party/"  data-text="LAG-ging Behind [T-SQL Tuesday #029 - Let's have a SQL Server 2012 party]" data-count="horizontal" data-via="sqlity"></a>
			</div>			
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;"><script src="http://www.stumbleupon.com/hostedbadge.php?s=1&amp;r=http://sqlity.net/en/879/lag-ging-behind-t-sql-tuesday-029-lets-have-a-sql-server-2012-party/"></script></div>			
			</div><div style="clear:both"></div><div style="padding-bottom:4px;"></div>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/879/lag-ging-behind-t-sql-tuesday-029-lets-have-a-sql-server-2012-party/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Purge Problems [TSQL Tuesday #027 - The Big Data Valentine’s Edition]</title>
		<link>http://sqlity.net/en/666/purge-problems-tsql-tuesday-027-the-big-data-valentines-edition/</link>
		<comments>http://sqlity.net/en/666/purge-problems-tsql-tuesday-027-the-big-data-valentines-edition/#comments</comments>
		<pubDate>Tue, 14 Feb 2012 15:00:40 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[T-SQL Tuesday]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=666</guid>
		<description><![CDATA[T-SQL Tuesday #27 is hosted by Steve Jones (blog&#124;twitter). This month’s topic is “Big Data”. Purge Problems in Big Data &#8211; Not only a SELECT needs an index Recently I was called to an issue at a 5+ TB customer. The purge job had stopped working a while back and the disk drives started to <a href="http://sqlity.net/en/666/purge-problems-tsql-tuesday-027-the-big-data-valentines-edition/#more-'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<style type="text/css">
<!--
p {margin-top:8px;}
-->
</style>
<div>
<p><a href="https://voiceofthedba.wordpress.com/2012/02/07/t-sql-tuesday-027-the-big-data-valentines-edition/"><img height="132" border="0" hspace="9" width="131" alt="SqlTuesday Purge Problems [TSQL Tuesday #027   The Big Data Valentine’s Edition]" src="http://images.sqlity.net/SqlTuesday.png" title="Purge Problems [TSQL Tuesday #027   The Big Data Valentine’s Edition]" /></a></p>
<p>T-SQL Tuesday #27 is hosted by Steve Jones (<a href="https://voiceofthedba.wordpress.com/">blog</a>|<a href="https://twitter.com/#!/way0utwest">twitter</a>). This month’s topic is “<a href="https://voiceofthedba.wordpress.com/2012/02/07/t-sql-tuesday-027-the-big-data-valentines-edition/">Big Data</a>”.</p>
<h2>Purge Problems in Big Data &ndash; Not only a SELECT needs an index</h2>
<p>
Recently I was called to an issue at a 5+ TB customer. The purge job had stopped working a while back and the disk drives started to feel all bloated. 
</p>
<p>
The purge job ran every night to delete data older than n days. It would execute for over 13 hours and than quit reporting that there were not enough resources to complete the query.
</p>
<p>
The procedure would go through 5 tables and delete rows based on their relationship to "expired" records in a common parent table. After that, those parent records where supposed to get deleted. But the procedure never got to finish the third delete statement.
</p>
<p>
The problem was, that each delete would anew request the list of expired records form the parent table in a DELETE WHERE parent_id IN (SELECT id FROM parent); format. The inner select had a where clause checking a date column and an indicator column. While there was an index on the date column, SQL Server could not use it because of the age calculation performed. Also, the indicator was not part of the index. With this setup SQL Server had to perform a table scan of one of the bigger tables in that database for each of the child tables and than one more time for the parent table itself.
</p>
<p>
There are a few issues with this algorithm, including that by the time the parent table got to be purged more rows might qualify than at the time the first child was purged. That would cause foreign key violations during the delete &ndash; but the process never got that far anyway.
</p>
<p>
To resolve the issue I added the indicator to the index on the date column as an included column. I than rewrote the procedure to retrieved the ids of all to-be-purged records into a temp table, rewriting the select in a way that the index actually could be used. All the delete statements would than join to that temp table to delete the necessary rows. I also made sure that there was an index on the parent_id on each child table &ndash; a recommended best practice for all foreign key relationships anyway.
</p>
<p>
With those changes in place, the purge is now happily humming along again. What we can learn from it is the fact, that while indexes usually help SELECTs and hinder INSERTs and DELETEs, sometimes you need an index to be able to even execute your DELETE statements.
</p> <div class="bottomcontainerBox" style="">
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fsqlity.net%2Fen%2F666%2Fpurge-problems-tsql-tuesday-027-the-big-data-valentines-edition%2F&amp;layout=button_count&amp;show_faces=false&amp;width=85&amp;action=like&amp;font=verdana&amp;colorscheme=light&amp;height=21" scrolling="no" frameborder="0" allowTransparency="true" style="border:none; overflow:hidden; width:85px; height:21px;"></iframe></div>
			<div style="float:left; width:80px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<g:plusone size="medium" href="http://sqlity.net/en/666/purge-problems-tsql-tuesday-027-the-big-data-valentines-edition/"></g:plusone>
			</div>
			<div style="float:left; width:95px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<a href="http://twitter.com/share" class="twitter-share-button" data-url="http://sqlity.net/en/666/purge-problems-tsql-tuesday-027-the-big-data-valentines-edition/"  data-text="Purge Problems [TSQL Tuesday #027 - The Big Data Valentine’s Edition]" data-count="horizontal" data-via="sqlity"></a>
			</div>			
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;"><script src="http://www.stumbleupon.com/hostedbadge.php?s=1&amp;r=http://sqlity.net/en/666/purge-problems-tsql-tuesday-027-the-big-data-valentines-edition/"></script></div>			
			</div><div style="clear:both"></div><div style="padding-bottom:4px;"></div>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/666/purge-problems-tsql-tuesday-027-the-big-data-valentines-edition/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Index Misconceptions [TSQL Tuesday #026 - Second Chances]</title>
		<link>http://sqlity.net/en/563/index-misconceptions-tsql-tuesday-026-second-chances/</link>
		<comments>http://sqlity.net/en/563/index-misconceptions-tsql-tuesday-026-second-chances/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 05:00:11 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[T-SQL Tuesday]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=563</guid>
		<description><![CDATA[T-SQL Tuesday #26 is hosted by David Howard (blog&#124;twitter). This month’s topic is “Second Chances”. Index Misconceptions The topic this month is "Second Chances" which means that we can write about anything, we would have liked to write about before but didn't get to. Well, not really anything, but any of the 25 previos T-SQL <a href="http://sqlity.net/en/563/index-misconceptions-tsql-tuesday-026-second-chances/#more-'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<style type="text/css">
<!--
p {margin-top:8px;}
-->
</style>
<div>
<p><a href="http://davidbrycehoward.com/archive/2012/01/tsql-tuesday-026-second-chances/"><img height="132" border="0" hspace="9" width="131" alt="SqlTuesday Index Misconceptions [TSQL Tuesday #026   Second Chances]" src="http://images.sqlity.net/SqlTuesday.png" title="Index Misconceptions [TSQL Tuesday #026   Second Chances]" /></a></p>
<p>T-SQL Tuesday #26 is hosted by David Howard (<a href="http://davidbrycehoward.com/">blog</a>|<a href="https://twitter.com/#!/DaveH0ward">twitter</a>). This month’s topic is “<a href="http://davidbrycehoward.com/archive/2012/01/tsql-tuesday-026-second-chances/">Second Chances</a>”.</p>
<h2>Index Misconceptions</h2>
<p>
The topic this month is "Second Chances" which means that we can write about anything, we would have liked to write about before but didn't get to. Well, not really anything, but any of the 25 previos T-SQL Tuesday topics. I picked two topics: <a href="http://michaeljswart.com/2010/09/invitation-to-participate-in-t-sql-tuesday-10-indexes/">Indexes</a> by <a href="http://michaeljswart.com/">Michael Swart</a> and <a href="http://sankarreddy.com/2010/10/invitation-to-participate-in-t-sql-tuesday-11-misconceptions-in-sql-server/">Misconceptions in SQL Server</a> by <a href="http://sankarreddy.com/">Sankar Reddy</a>.
</p>
<h3>Misconception: SQL Server Indexes are binary trees</h3>
<p>I keep running into articles that claim that SQL Server internally uses a binary tree to build it's indexes. That however is incorrect and I would like to use this opportunity to clear things up a little.
</p>
<p>
SQL Server stores its tables (to be exact: table partitions) in a format that is called a HoBT. HoBT stands for "Heap or B-Tree". A Heap is used for tables without a clustered index. A B-Tree is used for all clustered and nonclustered indexes. If you lock up the word <a href="http://en.wikipedia.org/wiki/B-tree">B-Tree on Wikipedia</a>, the first sentence states: "Not to be confused with <a href="http://en.wikipedia.org/wiki/Binary_tree" title="Binary tree">Binary tree</a>."
</p>
<p>
The B in B-Tree is often said to stand fo "Balanced". However, when Rudolf Bayer and Ed McCreight at Boeing invented the B-Tree in 1971, they did not specify the meaning of the B at all.
Speculations include Balanced, Bushy, Bayer and Boeing (<a href="http://en.wikipedia.org/wiki/B-tree#Etymology_unknown">see again Wikipedia</a>).
</p>
<p>
SQL Server is not using a plain B-Tree but instead a variation called a <a href="http://en.wikipedia.org/wiki/B%2B_tree">B+Tree</a>. The main difference to a B-Tree is, that in a B+Tree the actual data is only stored in the leaf nodes with all other nodes containing only key values. In B-Trees the data is distributed over all levels.
</p>
<p>
So, let's take a look under the covers to see the structure for ourselves. First lets create a table to play with:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
CREATE TABLE dbo.IdxTst1
    (
      Id INT IDENTITY(1, 1) ,
      V1 CHAR(795) ,
      F1 CHAR(7254) ,
      CONSTRAINT PK_IdxTst1 PRIMARY KEY CLUSTERED( Id, V1 )
    ) ;
</pre>
</div>
<p> The table has a 4 byte (INT) identity column and two fixed length CHAR columns. The Clustered Index key contains the Id and the 795 byte, a total of 799 bytes. The full row contains an additional 7254 bytes for a total row data size of 8053 bytes. Each row when stored contains some meta information and for this table this additional information is 7 bytes per row, which brings us to a total of 8060 bytes for each row. This is the maximum number of bytes that one row is allowed to take (we are excluding LOB data from this exercise). This row size will make sure that there is always only one row per page. 
</p>
<p>
  To look at the structure of the table we are going to use the <a href="http://msdn.microsoft.com/en-us/library/ms188917.aspx">sys.dm_db_index_physical_stats</a> dmf:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
SELECT index_type_desc,Alloc_unit_type_desc,index_depth,index_level,page_count,record_count,avg_record_size_in_bytes 
  FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('dbo.IdxTst1'),NULL,NULL,'DETAILED');
</pre>
</div>
<p>
This query returns one row for every level of every B+Tree for the dbo.IdxTst1 table. As the table is not partitioned and contains only one index (the clustered index), the query in our case returns only information about this one index.
</p>
<p>
After inserting one row into the table the query returns the following result set:
</p>
<table>
  <thead>
    <tr>
      <th class="index_type_desc-cell">index_type_desc</th>
      <th class="Alloc_unit_type_desc-cell">Alloc_unit_type_desc</th>
      <th class="index_depth-cell">index_depth</th>
      <th class="index_level-cell">index_level</th>
      <th class="page_count-cell">page_count</th>
      <th class="record_count-cell">record_count</th>
      <th class="avg_record_size_in_bytes-cell">avg_record_size_in_bytes</th>
    </tr>
  </thead>
  <tbody>
    <tr class="lastRow">
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">1</td>
      <td class="index_level-cell">0</td>
      <td class="page_count-cell">1</td>
      <td class="record_count-cell">1</td>
      <td class="avg_record_size_in_bytes-cell">8060</td>
    </tr>
  </tbody>
</table>
<p>
The B+Tree has so far only one level (index_level = 0), contains 1 node (page_count = 1) and 1 record within that node. The record has the expected size of 8060 bytes.
</p>
<p>
To insert a second row SQL Server needs to create a new page (node), as each page can hold only one row. With now two data pages we also need a new root page that links to the data pages:
</p>
<table>
  <thead>
    <tr>
      <th class="index_type_desc-cell">index_type_desc</th>
      <th class="Alloc_unit_type_desc-cell">Alloc_unit_type_desc</th>
      <th class="index_depth-cell">index_depth</th>
      <th class="index_level-cell">index_level</th>
      <th class="page_count-cell">page_count</th>
      <th class="record_count-cell">record_count</th>
      <th class="avg_record_size_in_bytes-cell">avg_record_size_in_bytes</th>
    </tr>
  </thead>
  <tbody>
    <tr class="firstRow">
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">2</td>
      <td class="index_level-cell">0</td>
      <td class="page_count-cell">2</td>
      <td class="record_count-cell">2</td>
      <td class="avg_record_size_in_bytes-cell">8060</td>
    </tr>
    <tr class="lastRow">
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">2</td>
      <td class="index_level-cell">1</td>
      <td class="page_count-cell">1</td>
      <td class="record_count-cell">2</td>
      <td class="avg_record_size_in_bytes-cell">806</td>
    </tr>
  </tbody>
</table>
<p>
As expected, the B+Tree has now two levels: Index_level 0 with 2 pages holding one row of 8060 bytes each, and index_level 1 for the root page which holds two records of 806 bytes each &ndash; one for each level 0 page. The 806 bytes contain the 799 byte key value plus the pointer to the data page for this key value.
</p>
<p>
With a key storage size of 806 a page can hold 10 different values, so lets insert an other 8 rows to check:
</p>
<table>
  <thead>
    <tr>
      <th class="index_type_desc-cell">index_type_desc</th>
      <th class="Alloc_unit_type_desc-cell">Alloc_unit_type_desc</th>
      <th class="index_depth-cell">index_depth</th>
      <th class="index_level-cell">index_level</th>
      <th class="page_count-cell">page_count</th>
      <th class="record_count-cell">record_count</th>
      <th class="avg_record_size_in_bytes-cell">avg_record_size_in_bytes</th>
    </tr>
  </thead>
  <tbody>
    <tr class="firstRow">
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">2</td>
      <td class="index_level-cell">0</td>
      <td class="page_count-cell">10</td>
      <td class="record_count-cell">10</td>
      <td class="avg_record_size_in_bytes-cell">8060</td>
    </tr>
    <tr class="lastRow">
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">2</td>
      <td class="index_level-cell">1</td>
      <td class="page_count-cell">1</td>
      <td class="record_count-cell">10</td>
      <td class="avg_record_size_in_bytes-cell">806</td>
    </tr>
  </tbody>
</table>
<p>
And one more to force an additional page on index_level 1:
</p>
<table>
  <thead>
    <tr>
      <th class="index_type_desc-cell">index_type_desc</th>
      <th class="Alloc_unit_type_desc-cell">Alloc_unit_type_desc</th>
      <th class="index_depth-cell">index_depth</th>
      <th class="index_level-cell">index_level</th>
      <th class="page_count-cell">page_count</th>
      <th class="record_count-cell">record_count</th>
      <th class="avg_record_size_in_bytes-cell">avg_record_size_in_bytes</th>
    </tr>
  </thead>
  <tbody>
    <tr class="firstRow">
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">3</td>
      <td class="index_level-cell">0</td>
      <td class="page_count-cell">11</td>
      <td class="record_count-cell">11</td>
      <td class="avg_record_size_in_bytes-cell">8060</td>
    </tr>
    <tr>
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">3</td>
      <td class="index_level-cell">1</td>
      <td class="page_count-cell">2</td>
      <td class="record_count-cell">11</td>
      <td class="avg_record_size_in_bytes-cell">806</td>
    </tr>
    <tr class="lastRow">
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">3</td>
      <td class="index_level-cell">2</td>
      <td class="page_count-cell">1</td>
      <td class="record_count-cell">2</td>
      <td class="avg_record_size_in_bytes-cell">806</td>
    </tr>
  </tbody>
</table>
<p>
As predicted, we now have two pages on index_level 1 and a new root page on index_level 2. All the data (11 rows) is still stored in index_level 0 in 11 separate pages. All intermediate (and root) pages contain only key values (806 bytes).
</p>
<p>
The root page can hold 10 entries and each index_level 1 page can also hold 10 entries. That should allow us to insert an additional 89 rows into this table without requiring a fourth index level. So let's try it:
</p>
<table>
  <thead>
    <tr>
      <th class="index_type_desc-cell">index_type_desc</th>
      <th class="Alloc_unit_type_desc-cell">Alloc_unit_type_desc</th>
      <th class="index_depth-cell">index_depth</th>
      <th class="index_level-cell">index_level</th>
      <th class="page_count-cell">page_count</th>
      <th class="record_count-cell">record_count</th>
      <th class="avg_record_size_in_bytes-cell">avg_record_size_in_bytes</th>
    </tr>
  </thead>
  <tbody>
    <tr class="firstRow">
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">4</td>
      <td class="index_level-cell">0</td>
      <td class="page_count-cell">100</td>
      <td class="record_count-cell">100</td>
      <td class="avg_record_size_in_bytes-cell">8060</td>
    </tr>
    <tr>
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">4</td>
      <td class="index_level-cell">1</td>
      <td class="page_count-cell">24</td>
      <td class="record_count-cell">100</td>
      <td class="avg_record_size_in_bytes-cell">806</td>
    </tr>
    <tr>
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">4</td>
      <td class="index_level-cell">2</td>
      <td class="page_count-cell">5</td>
      <td class="record_count-cell">24</td>
      <td class="avg_record_size_in_bytes-cell">806</td>
    </tr>
    <tr class="lastRow">
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">4</td>
      <td class="index_level-cell">3</td>
      <td class="page_count-cell">1</td>
      <td class="record_count-cell">5</td>
      <td class="avg_record_size_in_bytes-cell">806</td>
    </tr>
  </tbody>
</table>
<p>
What happened here? We have the expected 100 data rows and the corresponding 100 index_level 0 pages but instead of only 10 index_level 1 and one root page we have now 30 non leaf level pages spread over three levels.
</p>
<p>
The reason for this is that SQL Server, every time it requires a new page to insert a row, takes the page that should have contained the new row and splits its contents in half, leaving one half in the old page and moving the other half into the new page. After that it inserts the new row into the new page. This algorithm is used for all index pages, no matter of their index_level and independent of the value of the new row. (For leaf level pages SQL Server does not split the existing page if the position of the new row is the end of the table. It instead just adds a new empty page to the table and inserts the row in there. The example in this article hides this behavior as there is always only one row per data page.)
That algorithm leaves most of the intermediate index pages with only 5 rows, so after only 55 rows a new index level is required.
</p>
<p>
To reclaim that space lost space we need to rebuild the index:
</p>
<div>
<pre class="brush: sql; title: ; notranslate">
ALTER INDEX PK_IdxTst1 ON dbo.IdxTst1 REBUILD;
</pre>
</div>
<p>
Now the table structure looks as expected:
</p>
<table>
  <thead>
    <tr>
      <th class="index_type_desc-cell">index_type_desc</th>
      <th class="Alloc_unit_type_desc-cell">Alloc_unit_type_desc</th>
      <th class="index_depth-cell">index_depth</th>
      <th class="index_level-cell">index_level</th>
      <th class="page_count-cell">page_count</th>
      <th class="record_count-cell">record_count</th>
      <th class="avg_record_size_in_bytes-cell">avg_record_size_in_bytes</th>
    </tr>
  </thead>
  <tbody>
    <tr class="firstRow">
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">3</td>
      <td class="index_level-cell">0</td>
      <td class="page_count-cell">100</td>
      <td class="record_count-cell">100</td>
      <td class="avg_record_size_in_bytes-cell">8060</td>
    </tr>
    <tr>
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">3</td>
      <td class="index_level-cell">1</td>
      <td class="page_count-cell">10</td>
      <td class="record_count-cell">100</td>
      <td class="avg_record_size_in_bytes-cell">806</td>
    </tr>
    <tr class="lastRow">
      <td class="index_type_desc-cell">CLUSTERED INDEX</td>
      <td class="Alloc_unit_type_desc-cell">IN_ROW_DATA</td>
      <td class="index_depth-cell">3</td>
      <td class="index_level-cell">2</td>
      <td class="page_count-cell">1</td>
      <td class="record_count-cell">10</td>
      <td class="avg_record_size_in_bytes-cell">806</td>
    </tr>
  </tbody>
</table>
<p>
The FILLFACTOR of the index does not matter in this case as non data pages in an index are always completely filled during an index rebuild operation (unless PAD_INDEX = ON is also specified).
</p>
<h3>Conclusion</h3>
<p>
We did see that SQL Server uses a physical tree implementation that allows for more than two child nodes per tree node. We also confirmed that SQL Server stores only the key values in the non-leaf nodes. That are the two main characteristics of a B+Tree. It also clearly rules out the binary search tree format.
</p>
<p>
We also ran into a situation where most of the non-leaf levels of the index where only half filled. This is one of the disadvantages that come with the use of B+Trees. Usually this is however not a big issue, as the number of non leaf pages is usually small compared to the total number of pages. You might however want to keep an eye out for this behavior, especially if you ar dealing with overly wide keys, as we did in this example.
</p>
<p>
If you rather store you data as compact as possible you can do so by executing an index rebuild.
</p>
<h3>Additional Information</h3>
<p>
For further information about index internals with some nice graphics check out Michael Swart's <a href="http://michaeljswart.com/2010/09/guts-of-an-clustered-index/">Guts Of An Clustered Index</a>, his contribution to his own SQL Saturday #10.
</p>
</div><div class="bottomcontainerBox" style="">
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fsqlity.net%2Fen%2F563%2Findex-misconceptions-tsql-tuesday-026-second-chances%2F&amp;layout=button_count&amp;show_faces=false&amp;width=85&amp;action=like&amp;font=verdana&amp;colorscheme=light&amp;height=21" scrolling="no" frameborder="0" allowTransparency="true" style="border:none; overflow:hidden; width:85px; height:21px;"></iframe></div>
			<div style="float:left; width:80px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<g:plusone size="medium" href="http://sqlity.net/en/563/index-misconceptions-tsql-tuesday-026-second-chances/"></g:plusone>
			</div>
			<div style="float:left; width:95px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<a href="http://twitter.com/share" class="twitter-share-button" data-url="http://sqlity.net/en/563/index-misconceptions-tsql-tuesday-026-second-chances/"  data-text="Index Misconceptions [TSQL Tuesday #026 - Second Chances]" data-count="horizontal" data-via="sqlity"></a>
			</div>			
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;"><script src="http://www.stumbleupon.com/hostedbadge.php?s=1&amp;r=http://sqlity.net/en/563/index-misconceptions-tsql-tuesday-026-second-chances/"></script></div>			
			</div><div style="clear:both"></div><div style="padding-bottom:4px;"></div>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/563/index-misconceptions-tsql-tuesday-026-second-chances/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>T-SQL Tuesday #25 – SQL Server Tips &amp; Tricks</title>
		<link>http://sqlity.net/en/556/t-sql-tuesday-25-%e2%80%93-sql-server-tips-tricks/</link>
		<comments>http://sqlity.net/en/556/t-sql-tuesday-25-%e2%80%93-sql-server-tips-tricks/#comments</comments>
		<pubDate>Tue, 13 Dec 2011 16:39:22 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[T-SQL Tuesday]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=556</guid>
		<description><![CDATA[T-SQL Tuesday #25 is hosted by Allen White (blog). This month’s topic is “SQL Server Tips and Tricks”. Transaction Log Reuse Wait One question that keeps coming up in forums, at user group meetings and on twitter is: Why is my transaction log growing out of bounds? To adhere to the ACID properties, SQL Server <a href="http://sqlity.net/en/556/t-sql-tuesday-25-%e2%80%93-sql-server-tips-tricks/#more-'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<style type="text/css">
<!--
p {margin-top:8px;}
-->
</style>
<br />
<p><a href="http://sqlblog.com/blogs/allen_white/archive/2011/12/05/t-sql-tuesday-025-invitation-to-share-your-tricks.aspx"><img height="132" border="0" hspace="9" width="131" alt="SqlTuesday T SQL Tuesday #25 – SQL Server Tips & Tricks" src="http://images.sqlity.net/SqlTuesday.png" title="T SQL Tuesday #25 – SQL Server Tips & Tricks" /></a></p>
<p>T-SQL Tuesday #25 is hosted by Allen White (<a href="http://sqlblog.com/blogs/allen_white/default.aspx">blog</a>). This month’s topic is “<a href="http://sqlblog.com/blogs/allen_white/archive/2011/12/05/t-sql-tuesday-025-invitation-to-share-your-tricks.aspx">SQL Server Tips and Tricks</a>”.</p>
<h2>Transaction Log Reuse Wait</h2>
<p>
One question that keeps coming up in forums, at user group meetings and on twitter is: Why is my transaction log growing out of bounds?
</p>
<p>
To adhere to the <a href="http://en.wikipedia.org/wiki/ACID">ACID properties</a>, SQL Server records all changes to the database first into the transaction log file. A transaction can only be committed after that write succeeds. The changes to the data itself actually get applied only to the data pages in the buffer pool. If and when they get written to the disk is influenced by several factors that are independent of the transaction itself. The records in the transaction log allow SQL Server to redo a change after a crash for example that prevented the data page changes to be written to disk.
</p>
<p>
That means in theory, that after the transaction is committed and after the data pages are preserved on disk, there is no need for SQL Server to hold on to the transaction log data anymore. That's why the transaction log usually does not grow unbounded, because SQL Server can reuse the parts of the file(s) that are not needed anymore.
</p>
<p>
In practice however, there can be several reasons, why such a reuse is not possible. The most common one is the requirement for log backups. If you have your database set to recovery mode FULL, SQL Server does not reuse any part of the log file until it is backed up with a <a href="http://sqlserverpedia.com/wiki/Types_of_Backups">transaction log backup</a>. There are several other reasons why SQL Server might have to wait before it can reuse the transaction log, ranging from long running open transactions to transactional replication.
</p>
<p>
If you are in a situation that requires you to find out why the transaction log file keeps growing and growing, this simple query can give you the answer:
</p>
<pre class="brush: sql; title: ; notranslate">
SELECT name,log_reuse_wait_desc FROM sys.databases;
</pre> 
<p>
The log_reuse_wait_desc column contains the reason why the SQL Server currently can't reuse the log file of that database. Explanations for each of the values you can find in BOL under <a href="http://msdn.microsoft.com/en-us/library/ms345414.aspx">Factors That Can Delay Log Truncation</a>.
</p><div class="bottomcontainerBox" style="">
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fsqlity.net%2Fen%2F556%2Ft-sql-tuesday-25-%25e2%2580%2593-sql-server-tips-tricks%2F&amp;layout=button_count&amp;show_faces=false&amp;width=85&amp;action=like&amp;font=verdana&amp;colorscheme=light&amp;height=21" scrolling="no" frameborder="0" allowTransparency="true" style="border:none; overflow:hidden; width:85px; height:21px;"></iframe></div>
			<div style="float:left; width:80px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<g:plusone size="medium" href="http://sqlity.net/en/556/t-sql-tuesday-25-%e2%80%93-sql-server-tips-tricks/"></g:plusone>
			</div>
			<div style="float:left; width:95px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<a href="http://twitter.com/share" class="twitter-share-button" data-url="http://sqlity.net/en/556/t-sql-tuesday-25-%e2%80%93-sql-server-tips-tricks/"  data-text="T-SQL Tuesday #25 – SQL Server Tips &#038; Tricks" data-count="horizontal" data-via="sqlity"></a>
			</div>			
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;"><script src="http://www.stumbleupon.com/hostedbadge.php?s=1&amp;r=http://sqlity.net/en/556/t-sql-tuesday-25-%e2%80%93-sql-server-tips-tricks/"></script></div>			
			</div><div style="clear:both"></div><div style="padding-bottom:4px;"></div>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/556/t-sql-tuesday-25-%e2%80%93-sql-server-tips-tricks/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>T-SQL Tuesday #24 &#8211; Prox ‘n’ Funx</title>
		<link>http://sqlity.net/en/498/t-sql-tuesday-24-prox-n-funx/</link>
		<comments>http://sqlity.net/en/498/t-sql-tuesday-24-prox-n-funx/#comments</comments>
		<pubDate>Tue, 08 Nov 2011 08:00:26 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[T-SQL Tuesday]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=498</guid>
		<description><![CDATA[<h4>Performance Comparisons of different types of Functions</h4>
<p>
  A lot has been written about the performanmce impact of functions.
  Amongst other effects, functions used in a WHERE clause will prevent the use of an index seek for the columns involved.
</p>
<p>
  Sometimes however it makes sense to use functions to encapsulate a complicated calculation or to make code reuse possible. 
  It is therefore important that to understand just how big the impact is, that functions have on queries.
</p>
<p>
  I would like to use this blog post to look at the call overhead of different kinds of functions in SQL Server.
</p>
]]></description>
			<content:encoded><![CDATA[<div style="position:relative;">
<div style="float:right;">
<a href="http://bradsruminations.blogspot.com/2011/10/invitation-for-t-sql-tuesday-024-prox-n.html"><strong><img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;float:right;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="T-SQL Tuesday" border="0" alt="SqlTuesday T SQL Tuesday #24   Prox ‘n’ Funx" align="right" src="http://images.sqlity.net/SqlTuesday.png" width="132" height="131" /></strong></a>
</div>
<p>
It is T-SQL Tuesday again, number 24 about procedures and functions, hosted by Brad Schulz (<a href="http://bradsruminations.blogspot.com/">Blog</a>|no twitter).
</p>
<h4>Performance Comparisons of different types of Functions</h4>
<p>
  A lot has been written about the performanmce impact of functions.
  Amongst other effects, functions used in a WHERE clause will prevent the use of an index seek for the columns involved.
</p>
<p>
  Sometimes however it makes sense to use functions to encapsulate a complicated calculation or to make code reuse possible. 
  It is therefore important that to understand just how big the impact is, that functions have on queries.
</p>
<p>
  I would like to use this blog post to look at the call overhead of different kinds of functions in SQL Server.
</p>
<p>
  SQL Server knows three types of functions: Scalar Functions, Inline Table-Valued Functions and Multi Statement Table-Valued Functions.
  There are also Scalar and Table-Valued CLR Functions, but we will not be looking at those in this article.
</p>
<p>
  To be able to look just at the overhead of the function call we will use a very simple calculation: 1 * value
</p>
<p>
  All examples used in this post will select from a simple table:
</p>
<pre class="brush: sql; title: ; notranslate">
CREATE TABLE dbo.tbl1
(
  id INT CONSTRAINT PK_tbl1 PRIMARY KEY CLUSTERED,
  data INT NOT NULL
);
</pre>  
<p>
This table has been initialized with 1000000 rows containing random values in the data column.
</p>
<p>
As a base line we will use the following SELECT statement:
</p>
<pre class="brush: sql; title: ; notranslate">
SELECT * INTO #x FROM dbo.tbl1 WHERE (1*data) = 42;
</pre>
<p>
Note the use of "(1*data)" instead of just "data". This is the calculation we will implement 
in the different functions and the use of the calculation in the baseline query has the same 
impact on index usage as a function call has: It effectively prevents a seek operation on 
any existing index.
Also note the "INTO #x". It prevents that the results have to be send back to the client 
eliminating the chance for any ASYNCH_NETWORK_IO waits to affect the outcome of this performance test. 
Instead the results (if any) will be written to tempdb on the same machine.
</p>
<p>
Now lets look at the three functions:
</p>
<pre class="brush: sql; title: ; notranslate">
CREATE FUNCTION dbo.simpleSVF(@parm INT)
RETURNS INT
AS
BEGIN
  RETURN 1*@parm;
END;
GO
CREATE FUNCTION dbo.simpleTVF(@parm INT)
RETURNS TABLE
AS
RETURN SELECT 1*@parm val;
GO
CREATE FUNCTION dbo.multiStmtTVF(@parm INT)
RETURNS @r TABLE(val INT)
AS
BEGIN
  INSERT INTO @r(val) VALUES(1*@parm);
  RETURN;
END
</pre>
<p>
  They all implement the same calculation we saw above. 
  The Scalar Function (simpleSVF) just returns the result of that calculation directly,
  while the two Table-Valued functions return a resultset with one column and one row,
  containing the calculation result.
</p>
<p>
  To use the Scalar Function we can just replace the calculation with a call to the function. 
  The use of a Table-Values Function in this context requires a little more work. 
  There are two patterns that you can use: A correlated subquery or a CROSS APPLY. 
  We will compare both of them:
</p>
<pre class="brush: sql; title: ; notranslate">
SELECT * INTO #x FROM dbo.tbl1 WHERE (SELECT val FROM dbo.simpleTVF(data)) = 42;
SELECT * INTO #x FROM dbo.tbl1 t CROSS APPLY dbo.simpleTVF(t.data) f WHERE f.val = 42;
</pre>
<p>
  The same two call patterns will also be used for the Multi Statement Table-Valued function.
</p>
<p>
  To record the execution times I created a small stored procedure that uses the information 
  in sys.dm_exec_requests together with the SYSDATETIME() function for it's measurements:
</p>
<pre class="brush: sql; title: ; notranslate">
CREATE PROCEDURE dbo.RunTest 
  @cid INT,
  @cmd NVARCHAR(MAX)
AS 
BEGIN
  DECLARE @duration1 DATETIME2 ;
  DECLARE @duration2 DATETIME2 ;
  DECLARE @cpu1 INT ;
  DECLARE @cpu2 INT ;
  DECLARE @reads1 INT ;
  DECLARE @reads2 INT ;

  DBCC DROPCLEANBUFFERS ;
  DBCC FREEPROCCACHE ;
  SELECT  @duration1 = SYSDATETIME() ,
          @cpu1 = cpu_time ,
          @reads1 = logical_reads
  FROM    sys.dm_exec_requests
  WHERE   session_id = @@SPID ;

  EXEC(@cmd) ;

  SELECT  @duration2 = SYSDATETIME() ,
          @cpu2 = cpu_time ,
          @reads2 = logical_reads
  FROM    sys.dm_exec_requests
  WHERE   session_id = @@SPID ;
  
  INSERT INTO dbo.TestResults(cid,StartDTime,Duration,CPU,LogicalReads,Cmd)
  SELECT  @cid,
          @duration1,
          DATEDIFF(microsecond, @duration1, @duration2),
          @cpu2 - @cpu1,
          @reads2 - @reads1,
          @cmd;
END;
</pre>
<p>
  This procedure executes the passed in command and records the execution time in microseconds, 
  the time spend working on any CPU in milliseconds(!) as well as the logical reads in pages. 
  It also clears out the buffer cache as well as the procedure cache each time before 
  executing the statement. (So don't run this in production!)
</p>
<p>
  The first parameter passed in is the command id (cid) and it is not used by the procedure itself, 
  it is just stored together with the results. The 6 different calls to this procedure are listed below:
</p>
<pre class="brush: sql; title: ; notranslate">
EXEC dbo.RunTest 1,'SELECT * INTO #x FROM dbo.tbl1 WHERE (1*data) = 42;';
EXEC dbo.RunTest 2,'SELECT * INTO #x FROM dbo.tbl1 WHERE dbo.simpleSVF(data) = 42;';
EXEC dbo.RunTest 3,'SELECT * INTO #x FROM dbo.tbl1 WHERE (SELECT val FROM dbo.simpleTVF(data)) = 42;';
EXEC dbo.RunTest 4,'SELECT * INTO #x FROM dbo.tbl1 t CROSS APPLY dbo.simpleTVF(t.data) f WHERE f.val = 42;';
EXEC dbo.RunTest 5,'SELECT * INTO #x FROM dbo.tbl1 WHERE (SELECT val FROM dbo.multiStmtTVF(data)) = 42;';
EXEC dbo.RunTest 6,'SELECT * INTO #x FROM dbo.tbl1 t CROSS APPLY dbo.multiStmtTVF(t.data) f WHERE f.val = 42;';
</pre>
<p>
After execution the first four 100 times each 
and the last two 10 times each on my system I got the following numbers (average per execution):
</p>
<p>
<table>
<tr><td>ExecCount</td><td>Duration</td><td>CPU</td><td>LogicalReads</td><td>Cmd</td></tr>
<tr><td>100</td><td>725,781</td><td>219</td><td>2,296</td><td>SELECT * INTO #x FROM dbo.tbl1 WHERE (1*data) = 42;</td></tr>
<tr><td>100</td><td>3,007,272</td><td>2,702</td><td>2,400</td><td>SELECT * INTO #x FROM dbo.tbl1 WHERE dbo.simpleSVF(data) = 42;</td></tr>
<tr><td>100</td><td>708,660</td><td>209</td><td>2,283</td><td>SELECT * INTO #x FROM dbo.tbl1 WHERE (SELECT val FROM dbo.simpleTVF(data)) = 42;</td></tr>
<tr><td>100</td><td>715,560</td><td>211</td><td>2,290</td><td>SELECT * INTO #x FROM dbo.tbl1 t CROSS APPLY dbo.simpleTVF(t.data) f WHERE f.val = 42</td></tr>
<tr><td>10</td><td>112,347,425</td><td>109,412</td><td>33,984,827</td><td>SELECT * INTO #x FROM dbo.tbl1 WHERE (SELECT val FROM dbo.multiStmtTVF(data)) = 42;</td></tr>
<tr><td>10</td><td>109,809,480</td><td>107,776</td><td>33,984,881</td><td>SELECT * INTO #x FROM dbo.tbl1 t CROSS APPLY dbo.multiStmtTVF(t.data) f WHERE f.val = 42</td></tr>
</table>

</p>
<p>
As you can see, there is almost no difference between the baseline and the two 
usages of the Inline Table-Valued Function.
The Scalar Function on the other hand took more than four times as long. Most of that time 
(2.7 of 3 seconds) was spend actually using the processor. There were also about 100 more pages read.
</p>
<p>
Now lets look at the Multi Statement Table-Valued Function. 
I had to stop its execution after just 10 rounds to get the results in before Tuesday.
This function slows the query by a factor of over 150. 
</p>
<p>
This is caused by the fact that a Multi Statement Table-Valued Function uses a 
table variable to collect the rows.
To create a table variable - similar to any other table - at least two pages 
need to be reserved as well as several changes to system tables 
and internal database pages need to be performed. 
That all needs to be undone once the function finished executing.
</p>
<p>
In both queries above, the function gets executed for every row &mdash; 1,000,000 times in this case. This is causing all that table setup and destroy work to be executed 1,000,000 times as well.
</p>
<h4>Conclusion</h4>
<p>
If you find yourself in a situation that lends itself to the use of a function, 
try to go with an Inlined Table-Valued Function as there is almost no performance impact. 
However, always remeber that this blog post did not look into the impact that a function has on index usage. 
</p>
<p>
Unless you are writing a function that is going to be executed only rarely, stay away
from Multi Statement Table-Valued Functions.
</p>
<div class="bottomcontainerBox" style="">
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fsqlity.net%2Fen%2F498%2Ft-sql-tuesday-24-prox-n-funx%2F&amp;layout=button_count&amp;show_faces=false&amp;width=85&amp;action=like&amp;font=verdana&amp;colorscheme=light&amp;height=21" scrolling="no" frameborder="0" allowTransparency="true" style="border:none; overflow:hidden; width:85px; height:21px;"></iframe></div>
			<div style="float:left; width:80px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<g:plusone size="medium" href="http://sqlity.net/en/498/t-sql-tuesday-24-prox-n-funx/"></g:plusone>
			</div>
			<div style="float:left; width:95px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<a href="http://twitter.com/share" class="twitter-share-button" data-url="http://sqlity.net/en/498/t-sql-tuesday-24-prox-n-funx/"  data-text="T-SQL Tuesday #24 &#8211; Prox ‘n’ Funx" data-count="horizontal" data-via="sqlity"></a>
			</div>			
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;"><script src="http://www.stumbleupon.com/hostedbadge.php?s=1&amp;r=http://sqlity.net/en/498/t-sql-tuesday-24-prox-n-funx/"></script></div>			
			</div><div style="clear:both"></div><div style="padding-bottom:4px;"></div>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/498/t-sql-tuesday-24-prox-n-funx/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>T-SQL Tuesday #22 &#8211; Data Presentation &#8211; XML Concatenation</title>
		<link>http://sqlity.net/en/400/t-sql-tuesday-22-data-presentation/</link>
		<comments>http://sqlity.net/en/400/t-sql-tuesday-22-data-presentation/#comments</comments>
		<pubDate>Tue, 13 Sep 2011 07:34:54 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[T-SQL Tuesday]]></category>
		<category><![CDATA[XML]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=400</guid>
		<description><![CDATA[It is T-SQL Tuesday again, the 22nd incarnation and this time the topic is the presentation of data, hosted by Robert Pearl (Blog&#124;Twitter). Every so often you run into the problem of having to display a list of items to the user. This could be a list of item numbers or a list of names. <a href="http://sqlity.net/en/400/t-sql-tuesday-22-data-presentation/#more-'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<div style="position:relative;">
<div style="float:right;">
<a href="http://www.sqlservercentral.com/blogs/pearlknows/archive/2011/09/06/invitation-for-t-sql-tuesday-22-data-presentation.aspx"><strong><img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;float:right;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="T-SQL Tuesday" border="0" alt="SqlTuesday T SQL Tuesday #22   Data Presentation   XML Concatenation" align="right" src="http://images.sqlity.net/SqlTuesday.png" width="132" height="131" /></strong></a>
</div>
<p>
It is T-SQL Tuesday again, the 22nd incarnation and this time the topic is the presentation of data, hosted by Robert Pearl (<a href="http://www.sqlservercentral.com/blogs/pearlknows/default.aspx">Blog</a>|<a href="https://twitter.com/#!/PearlKnows">Twitter</a>).
</p>
Every so often you run into the problem of having to display a list of items to the user. This could be a list of item numbers or a list of names. 
</p>
<p>
Assume for example if you are trying to write a view that displays information about all the indexes in the database. 
You want this view to return one row per index. You also want to include the columns of the index. To make this happen you need to concatenate the information for the columns into one string.
</p>
<p>
This post is going to look at how to implement string concatenation in T-SQL.
</p>
<p>
In many other database management systems you have access to a function <a href="http://dev.mysql.com/doc/refman/5.0/en/group-by-functions.html#function_group-concat"><code>GROUP_CONCAT()</code></a> that provides just this functionality. In T-SQL it is not that easy yet. While Denali offers a new <a href="http://blogs.lessthandot.com/index.php/DataMgmt/DBProgramming/MSSQLServer/concat-function-in-sql-server"><code>CONCAT()</code></a> function, a <code>GROUP_CONCAT()</code> is not available in SQL Server up to version Denali CTP3.
</p>
<h4>XML to the rescue</h4>
<p>
There are several ways to go about this problem. For example you could write a UDF or you could rely on undocumented behavior<sup>1</sup>.
Most of those solutions seem complicated and also do not really address a real concatenation in a <code>GROUP BY</code> context.
</p>
<p>
There is one solution that I have found to be the easiest to use: FOR XML.
</p>
<p>
Using the <code>FOR XML</code> clause, SQL Server allows the output of any select to be a single XML Document:
</p>
<pre class="brush: sql; title: ; notranslate">
SELECT  name
FROM    master.sys.tables
FOR     XML PATH('row');
</pre>
<p>returns something like</p>
<pre class="brush: xml; title: ; notranslate">
&lt;row&gt;
  &lt;name&gt;spt_fallback_db&lt;/name&gt;
&lt;/row&gt;
&lt;row&gt;
  &lt;name&gt;spt_fallback_dev&lt;/name&gt;
&lt;/row&gt;
&lt;row&gt;
  &lt;name&gt;spt_fallback_usg&lt;/name&gt;
&lt;/row&gt;
&lt;row&gt;
  &lt;name&gt;spt_monitor&lt;/name&gt;
&lt;/row&gt;
&lt;row&gt;
  &lt;name&gt;spt_values&lt;/name&gt;
&lt;/row&gt;
&lt;row&gt;
  &lt;name&gt;MSreplication_options&lt;/name&gt;
&lt;/row&gt;
</pre>
<p> Every row gets wrapped into a <code>&lt;row&gt;</code> tag and every column in a tag that matches the columns name.
</p>
<p>
To eliminate the row tag we can pass in an empty string to the <code>PATH()</code> function. The column tags are eliminated by naming the column <code>&#91;text()]</code>.
</p>
<p>
So our example would look like this:
</p>
<pre class="brush: sql; title: ; notranslate">
SELECT  name AS [text()]
FROM    master.sys.tables
FOR     XML PATH('');
</pre>
<p>It's output looks like this:</p>
<pre class="brush: xml; title: ; notranslate">
spt_fallback_dbspt_fallback_devspt_fallback_usgspt_monitorspt_valuesMSreplication_options
</pre>
<p>
That looks already very promising. You probably would like to add separators in between the strings. How you can do that we will get to a little later.
There are a few things that I would like to address first. 
</p>
<h4>Special Characters</h4>
<p>
Let's look at characters that have a special meaning in XML like &lsquo;&lt;&rsquo;:
</p>
<pre class="brush: sql; title: ; notranslate">
SELECT '&lt;&amp;&gt;' AS [text()]
FOR XML PATH('');
</pre>
<p>Output:</p>
<pre class="brush: xml; title: ; notranslate">
&amp;lt;&amp;amp;&amp;gt;
</pre>
<p>
That is not what we wanted. To get those characters unescaped, we need to use one of the XML datatype functions:
</p>
<pre class="brush: sql; title: ; notranslate">
SELECT (SELECT '&lt;&amp;&gt;'
        FOR XML PATH(''), TYPE
       ).value('.','NVARCHAR(MAX)');
</pre>
<p>
This has the expected output: &lsquo;&lt;&amp;&gt;&rsquo;. You might have noticed the missing column name. If the column name is not specified at all (explicitly or otherwise), it has the same effect as if the <code>&#91;text()]</code> name is given.  The additional <code>TYPE</code> keyword causes the XML to be returned as an <code>XML</code> datatype result instead of as an <code>NVARCHAR(MAX)</code> datatype result. The <code>.value()</code> function is defined on the <code>XML</code> datatype and takes two parameters. The first one is using the XPath syntax and describes what we want to get back. The '.' here means everything. The second one specifies the datatype to which we want the result to be converted to.
</p>
<h4>Illegal Characters</h4>
<p>
The next problem is not that easy to solve. The SQL Server <code>XML</code> datatype cannot handle specific characters at all:
</p>
<pre class="brush: sql; title: ; notranslate">
SELECT CHAR(1)
FOR XML PATH('');
</pre>
This is handled and escaped correctly to <code>&amp;#x01;</code> however, it falls apart once we add the necessary <code>TYPE</code> keyword back in:
</p>
<pre class="brush: sql; title: ; notranslate">
SELECT CHAR(1)
FOR XML PATH(''), TYPE;
</pre>
<pre style="color:#F00;">
Msg 6841, Level 16, State 1, Line 1
FOR XML could not serialize the data for node 'NoName' because it contains a character (0x0001) which is not allowed in XML. To retrieve this data using FOR XML, convert it to binary, varbinary or image data type and use the BINARY BASE64 directive.
</pre>
<p>
There is a total of 2079 single character values that will cause this error. How to figure out which ones exactly will have to wait for another blog post. For now, if you think you might run ito any of them, you can use the following code to replace the most common of them with a question mark:
</p>
<pre class="brush: sql; title: ; notranslate">
SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE( 
       N'the text that is to be cleaned here' 
,NCHAR(1),N'?'),NCHAR(2),N'?'),NCHAR(3),N'?'),NCHAR(4),N'?'),NCHAR(5),N'?'),NCHAR(6),N'?'),NCHAR(7),N'?'),NCHAR(8),N'?'),NCHAR(11),N'?'),NCHAR(12),N'?'),NCHAR(14),N'?'),NCHAR(15),N'?'),NCHAR(16),N'?'),NCHAR(17),N'?'),NCHAR(18),N'?'),NCHAR(19),N'?'),NCHAR(20),N'?'),NCHAR(21),N'?'),NCHAR(22),N'?'),NCHAR(23),N'?'),NCHAR(24),N'?'),NCHAR(25),N'?'),NCHAR(26),N'?'),NCHAR(27),N'?'),NCHAR(28),N'?'),NCHAR(29),N'?'),NCHAR(30),N'?'),NCHAR(31),N'?');
</pre>
<p>
This replacement has to happen before the text is converted to XML. The following examples are going to skip this step for readability.
</p> 
<h4>Separators</h4>
<p>
Now we know how to concatenate string values together. To make our initial example of a select statement (or view) that returns one row per index and includes a comma separated column list work, two more steps are missing. First, the separator needs to be inserted between the values but not in front of the first one. This is not possible. But it is fairly simple to remove the additional separator after adding one in front of every item:
</p>
<pre class="brush: sql; title: ; notranslate">
SELECT  STUFF((SELECT   ', ' + name
               FROM     master.sys.tables
               FOR XML PATH('') ,TYPE
              ).value('.', 'NVARCHAR(MAX)'), 
              1, 2, ''
             );
</pre>
<p>
The <code>STUFF</code> function helps us with this problem. The first three parameters work just like the ones in the <code>SUBSTRING</code> function. But instead of returning that substring, the <code>STUFF</code> function replaces it with the value passed in as fourth parameter and then returns the complete string.
</p>
<h4>Putting it all together</h4>
<p>
The last step is to do this all inside a <CODE>GROUP BY</code> query. For that we have to use the <code>CROSS APPLY</code> functionality to create the column list for each group value in a correlated sub-query. Putting this all together looks like this:
</p>
<pre class="brush: sql; title: ; notranslate">
SELECT  t.name AS TblName ,
        i.name AS IdxName ,
        Columns.List AS Columns
FROM    sys.tables t
        JOIN sys.indexes i ON i.object_id = t.object_id
        CROSS APPLY ( SELECT    STUFF((SELECT   ', ' + c.name
                                                + CASE WHEN ic.is_descending_key = 1
                                                       THEN ' DESC'
                                                       ELSE ''
                                                  END
                                       FROM     sys.index_columns ic
                                                JOIN sys.columns c ON ic.object_id = c.object_id
                                                              AND ic.column_id = c.column_id
                                       WHERE    i.object_id = ic.object_id
                                                AND i.index_id = ic.index_id
                                       ORDER BY ic.index_column_id
                                       FOR   XML PATH('') ,
                                          TYPE
                                      ).value('.', 'NVARCHAR(MAX)')
                                      , 1, 2, ''
                                     )
                    ) Columns ( List ) ;
</pre>
<p>
This query returns the table name, the index name and a comma separated list of all columns for each index in the database. It also marks descending columns with the <code>DESC</code> keyword. What it does not do is separate out included columns. I'll leave that as an exercise for the reader. 
</p>
<p style="background:#CCC;font-size:0.8em;">1) One method that relies on undocumented behavior would be to continually add to a single variable within a single multirow select.</p>
</div><div class="bottomcontainerBox" style="">
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fsqlity.net%2Fen%2F400%2Ft-sql-tuesday-22-data-presentation%2F&amp;layout=button_count&amp;show_faces=false&amp;width=85&amp;action=like&amp;font=verdana&amp;colorscheme=light&amp;height=21" scrolling="no" frameborder="0" allowTransparency="true" style="border:none; overflow:hidden; width:85px; height:21px;"></iframe></div>
			<div style="float:left; width:80px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<g:plusone size="medium" href="http://sqlity.net/en/400/t-sql-tuesday-22-data-presentation/"></g:plusone>
			</div>
			<div style="float:left; width:95px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<a href="http://twitter.com/share" class="twitter-share-button" data-url="http://sqlity.net/en/400/t-sql-tuesday-22-data-presentation/"  data-text="T-SQL Tuesday #22 &#8211; Data Presentation &#8211; XML Concatenation" data-count="horizontal" data-via="sqlity"></a>
			</div>			
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;"><script src="http://www.stumbleupon.com/hostedbadge.php?s=1&amp;r=http://sqlity.net/en/400/t-sql-tuesday-22-data-presentation/"></script></div>			
			</div><div style="clear:both"></div><div style="padding-bottom:4px;"></div>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/400/t-sql-tuesday-22-data-presentation/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>T-SQL Tuesday #21 &#8211; Oh crap&#8230;</title>
		<link>http://sqlity.net/en/297/t-sql-tuesday-21-oh-crap/</link>
		<comments>http://sqlity.net/en/297/t-sql-tuesday-21-oh-crap/#comments</comments>
		<pubDate>Wed, 10 Aug 2011 07:00:23 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[T-SQL Tuesday]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=297</guid>
		<description><![CDATA[This weeks SQL Tuesday, hosted by Adam Machanic (blog, twitter) himself, is about crappy things from our past... The first thing that popped into my head when reading Adams post was a story about a hack that, while I did not cause it, I spent a good piece of time at a previous employer trying <a href="http://sqlity.net/en/297/t-sql-tuesday-21-oh-crap/#more-'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<div style="position:relative;">
<div style="float:right;">
<a href="http://sqlblog.com/blogs/adam_machanic/archive/2011/08/03/t-sql-tuesday-21-a-day-late-and-totally-full-of-it.aspx"><strong><img style="background-image:none;border-right-width:0px;padding-left:0px;padding-right:0px;display:inline;float:right;border-top-width:0px;border-bottom-width:0px;border-left-width:0px;padding-top:0px;" title="TSQLWednesday" border="0" alt="TSQLWednesday thumb 578C7A06 T SQL Tuesday #21   Oh crap..." align="right" src="http://sqlblog.com/blogs/adam_machanic/TSQLWednesday_thumb_578C7A06.jpg" width="244" height="244" /></strong></a>
</div>
<p>
This weeks SQL Tuesday, hosted by Adam Machanic (<a href="http://sqlblog.com/blogs/adam_machanic/">blog</a>, <a href="http://twitter.com/#!/AdamMachanic">twitter</a>) himself, is about <a href="http://sqlblog.com/blogs/adam_machanic/archive/2011/08/03/t-sql-tuesday-21-a-day-late-and-totally-full-of-it.aspx">crappy things from our past...</a>
</p>
<p>
The first thing that popped into my head when reading Adams post was a story about a hack that, while I did not cause it, I spent a good piece of time at a previous employer trying to deal with the crap that came out of it.
</p>
<p>
The story starts, as so many stories do, with a database.
This database contained a set of lookup values in a single table. These values were needed all over the application driving business logic and display parameters. These values were also used in almost all reports. Most reports needed to look up several values in this table, so Joins, joining 20+ times to this table were common. But I did not start this story to talk about database performance. What  I want to talk about was this comment in the JAVA code of the application, the part of it that was dealing with those lookup values. The comment was: "This is a hack and should never be deployed to production!". It was leading on a bunch of code retrieving those lookup values from an XML file. &mdash; I can hear you now say: "A what? I thought the values were in the database..."
</p>
<p>
Needless to say that this code made it into production and caused a double maintenance nightmare. I tried for several years to convince the money controlling powers that cleaning up this mess would actually save a lot of money, but there was always something more important to be done.
</p>
<p>
By the time I left the company that code had been in production for ~ 10 years, costing the company several tens if not hundreds of thousands of dollars. As far as I know it is still in there...
</p>
<p>
So, the next time you are working on a quick hack to save some time right now I would like you to hold on. I would like you to think about the cost associated with this short cut. Think about what happens if this code starts settling its ugly roots in your production environment. After that turn around and implement the right thing. And if your boss comes breathing down your neck about your decision to do the right thing, send him to read this story.
</p> 
<p>
<br />
&mdash;
<br />
</p>
</div>
<div style="position:relative;">
<div style="float:left;padding:15px 5px;">
<a href="http://www.amazon.com/gp/product/0132350882/ref=as_li_ss_il?ie=UTF8&tag=sqlitynet-20&linkCode=as2&camp=217145&creative=399369&creativeASIN=0132350882"><img border="0" src="http://ws.assoc-amazon.com/widgets/q?_encoding=UTF8&Format=_SL160_&ASIN=0132350882&MarketPlace=US&ID=AsinImage&WS=1&tag=sqlitynet-20&ServiceVersion=20070822" title="T SQL Tuesday #21   Oh crap..." alt=" T SQL Tuesday #21   Oh crap..." /></a><img src="http://www.assoc-amazon.com/e/ir?t=&l=as2&o=1&a=0132350882&camp=217145&creative=399369" width="1" height="1" border="0" alt=" T SQL Tuesday #21   Oh crap..." style="border:none !important; margin:0px !important;" title="T SQL Tuesday #21   Oh crap..." />
</div>
<p>
If you would like to read more about good coding practices, I can recommend the book <a href="http://www.amazon.com/gp/product/0132350882/ref=as_li_ss_tl?ie=UTF8&tag=sqlitynet-20&linkCode=as2&camp=217145&creative=399369&creativeASIN=0132350882">Clean Code: A Handbook of Agile Software Craftsmanship</a>.<img src="http://www.assoc-amazon.com/e/ir?t=&l=as2&o=1&a=0132350882&camp=217145&creative=399369" width="1" height="1" border="0" alt=" T SQL Tuesday #21   Oh crap..." style="border:none !important; margin:0px !important;" title="T SQL Tuesday #21   Oh crap..." />
While it focuses on JAVA, there is a lot of good information in there about how to write code so that it is easily readable and maintainable. A lot of it can be used in the SQL Server environment too.
</p>
<p>The spirit of this book can be found on the inside title page:
</p>
<p style="border:1px solid #AAA;border-radius:5px;padding:8px;display:inline-block;">
<em>
Writing clean code is what you must do in order to call yourself a professional.
<br />
There is no reasonable excuse for doing anything less than your best.
</em>
</p>
<p>
<br />
&mdash;
<br />
</p>
</div>
<div class="bottomcontainerBox" style="">
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fsqlity.net%2Fen%2F297%2Ft-sql-tuesday-21-oh-crap%2F&amp;layout=button_count&amp;show_faces=false&amp;width=85&amp;action=like&amp;font=verdana&amp;colorscheme=light&amp;height=21" scrolling="no" frameborder="0" allowTransparency="true" style="border:none; overflow:hidden; width:85px; height:21px;"></iframe></div>
			<div style="float:left; width:80px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<g:plusone size="medium" href="http://sqlity.net/en/297/t-sql-tuesday-21-oh-crap/"></g:plusone>
			</div>
			<div style="float:left; width:95px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<a href="http://twitter.com/share" class="twitter-share-button" data-url="http://sqlity.net/en/297/t-sql-tuesday-21-oh-crap/"  data-text="T-SQL Tuesday #21 &#8211; Oh crap&#8230;" data-count="horizontal" data-via="sqlity"></a>
			</div>			
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;"><script src="http://www.stumbleupon.com/hostedbadge.php?s=1&amp;r=http://sqlity.net/en/297/t-sql-tuesday-21-oh-crap/"></script></div>			
			</div><div style="clear:both"></div><div style="padding-bottom:4px;"></div>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/297/t-sql-tuesday-21-oh-crap/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>T-SQL Tuesday #20 – T-SQL Best Practices</title>
		<link>http://sqlity.net/en/232/t-sql-tuesday-20-%e2%80%93-t-sql-best-practices/</link>
		<comments>http://sqlity.net/en/232/t-sql-tuesday-20-%e2%80%93-t-sql-best-practices/#comments</comments>
		<pubDate>Tue, 12 Jul 2011 08:51:22 +0000</pubDate>
		<dc:creator>Sebastian</dc:creator>
				<category><![CDATA[General]]></category>
		<category><![CDATA[T-SQL Tuesday]]></category>

		<guid isPermaLink="false">http://sqlity.net/en/?p=232</guid>
		<description><![CDATA[This post is my contribution to T-SQL Tuesday #20, hosted by Amit Banerjee (blog &#124; twitter). This month’s topic is “T-SQL Best Practices”. Interfaces In OOP it is a common best practice to use interfaces. Interfaces provide a layer of separation between the parts of the system. It also provides a contract layer. There are <a href="http://sqlity.net/en/232/t-sql-tuesday-20-%e2%80%93-t-sql-best-practices/#more-'" class="more-link">more »</a>]]></description>
			<content:encoded><![CDATA[<style type="text/css">
<!--
p {margin-top:8px;}
-->
</style>
<br />
<p><img height="132" border="0" hspace="9" width="131" alt="SqlTuesday T SQL Tuesday #20 – T SQL Best Practices" src="http://images.sqlity.net/SqlTuesday.png" title="T SQL Tuesday #20 – T SQL Best Practices" /></p>
<p>This post is my contribution to T-SQL Tuesday #20, hosted by Amit Banerjee (<a href="http://troubleshootingsql.com/">blog</a> | <a href="http://twitter.com/banerjeeamit">twitter</a>). This month’s topic is “<a href="http://troubleshootingsql.com/2011/07/05/invitation-for-t-sql-tuesday-19-t-sql-best-practices/">T-SQL Best Practices</a>”.</p>
<h2>Interfaces</h2>
<p>In <a href="http://en.wikipedia.org/wiki/Object-oriented_programming" title="Object Oriented Programming">OOP</a> it is a common best practice to use interfaces. Interfaces provide a layer of separation between the parts of the system. It also provides a contract layer. There are several advantages in using interfaces, most importantly the ability to hide implementation details of a single module from other modules that are using it. For a good introduction into this topic, check out <a href="http://www.amazon.com/gp/product/0976694050/ref=as_li_ss_tl?ie=UTF8&amp;tag=sqlitynet-20&amp;linkCode=as2&amp;camp=217145&amp;creative=399369&amp;creativeASIN=0976694050">Interface Oriented Design</a> by Ken Pugh.</p>
<h2>Interfaces in SQL</h2>
<p>SQL Server does not provide a language construct that allows for the use of interfaces in the sense of an OOP language. However, in some areas it allows us to get close.</p><p>One of the best practices that I strongly recommend for every project to follow is the use of an interface layer between the data and the application. In particular, application code should never directly read from or write to a table. Instead, all access should go through views or stored procedures.</p><p>What do I gain by doing this? The biggest advantage is that in an implementation like this, I can change the table structure in a live system while hiding the changes from the application.</p><p>There is only one thing certain in software development, and that is that there will be change, so it is a good idea to be ready to implement changes.</p><p>While you can change out the application code in a matter of seconds, there are changes in the database that will take a substantial amount of time. Think about a simple change like moving the phone number from the Person table into a newly created PersonPhone table because the system now allows for multiple phones per person. The old version of the application expects the phone number field to be in the Person table. The new version expects it to not be there. So at some point you have to take the database offline, copy all the phones over to the new table, drop the old column and then bring the system back online. At the same time, the application upgrade needs to happen.</p><p>If the application has an SLA that does not allow for such a downtime, you are stuck.</p><p>However, if the access goes through a view, you can implement the necessary changes to the table structure behind the scenes while the application still sees the old picture.</p><p>I am not saying that implementing a change like this becomes trivial if you use views. Instead, I am saying that an interface layer like a view makes it a lot simpler.</p><p>In addition, if you use a version specific schema or schema prefix you could even have both versions live while you go through the app servers and upgrade them one by one.</p><p>The rest of this post goes through an example to show what an actual implementation of this pattern would look like.</p><p>The following code listing creates a tbl.Pers table and a view v001.Pers that selects from it. There is also a statement to fill the table with some random data using <a href="http://www.sqlmag.com/article/sql-server/virtual-auxiliary-table-of-numbers">Itzik’s GetNum function</a>, to have some data in there for testing.</p>
<div>
    <pre>
     CREATE SCHEMA tbl ;
     GO
     CREATE SCHEMA v001 ;
     GO
     CREATE TABLE tbl.Pers (
        PersNo INT IDENTITY(1, 1)
                   PRIMARY KEY CLUSTERED,
        FirstName NVARCHAR(60),
        LastName NVARCHAR(60),
        PhoneNo VARCHAR(20)
       ) ;
     GO
     CREATE VIEW v001.Pers
     AS
     SELECT  *
     FROM    tbl.Pers ;
     GO
     INSERT  INTO tbl.Pers
             (
              FirstName,
              LastName,
              PhoneNo
             )
             SELECT  CAST(NEWID() AS NVARCHAR(60)),
                     CAST(NEWID() AS NVARCHAR(60)),
                     CAST(ABS(CHECKSUM(NEWID())) AS VARCHAR(20))
             FROM    dbo.GetNums(1000)
</pre>
</div>
<p>If you now run a select against the table and one against the view you will see the same execution plan:</p>
<p><img height="182" border="0" width="230" alt="232 ExecutionPlan1 T SQL Tuesday #20 – T SQL Best Practices" src="http://images.sqlity.net/232_ExecutionPlan1.png" title="T SQL Tuesday #20 – T SQL Best Practices" /></p>
<p>The same is true for an update:</p>
<p><img height="184" border="0" width="271" alt="232 ExecutionPlan2 T SQL Tuesday #20 – T SQL Best Practices" src="http://images.sqlity.net/232_ExecutionPlan2.png" title="T SQL Tuesday #20 – T SQL Best Practices" /></p>
<p>The view gets eliminated by the optimizer and therefore there is no impact on the execution time of your queries.</p><p>However there will be a small impact on the compilation time.</p><p>The view is actually doing a SELECT * from the table and there is a reason for that. You are probably aware of the best practice to always specify a column list. I am not going into details here as to why you should follow that best practice.</p><p>Also, the purpose of the view in this example is to build an interface. One of the things an interface should do is to specify how it can be used, and a * does not help with that. So there seem to be a lot of reasons to specify the column list in the view.</p><p>However, you are not going to change all the tables all the time, so most of the time the views are just going to reflect the table columns. By not specifying the list in this context you make the life of the optimizer (specifically the binding stage) a little easier. Every time a query using this view gets compiled the view text needs to be parsed and included into the text of the query so that the execution plan can be built. The binding stage is responsible to link the query to the objects and check object and column names. The less text it has to read through, the quicker it can do its job, so the * in this case can be seen as a performance optimization.</p><p>The preceding paragraph also points out that compiling a query using such a view is potentially slower than a comparable query not using the view. That means you need to make sure not to use too many non-cacheable <a href="#_edn1" name="_ednref1" title="" id="_ednref1"><sup>1</sup></a> ad-hoc queries to avoid having to compile your statements too often. But that is another well-known best practice anyway. Even if you can’t get rid of all of them, the impact of the views will be comparatively small, as the output of the binding phase, the algebrizer tree, can be cached for views.</p><p>One last thing you need to be aware of: Before you actually implement any change to any of the tables, make sure to change their views to now mention the column names, so that a change to the table structure does not become immediately visible to the users of the view.</p>
<div>
    <div id="edn1" style="border-top:1px solid #000;font-size:.8em;"><br /> <a href="#_ednref1" name="_edn1" title="" id="_edn1"><sup>1</sup></a>) The right term here is really “not reusable”. All statements are cached and if the exact statement gets send again, the cache can be used. However if even a space character has changed in the query, the cache entry becomes useless. Using stored procedures or at least prepared statements can prevent this waste of resources.</div>
</div>
<div class="bottomcontainerBox" style="">
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<iframe src="http://www.facebook.com/plugins/like.php?href=http%3A%2F%2Fsqlity.net%2Fen%2F232%2Ft-sql-tuesday-20-%25e2%2580%2593-t-sql-best-practices%2F&amp;layout=button_count&amp;show_faces=false&amp;width=85&amp;action=like&amp;font=verdana&amp;colorscheme=light&amp;height=21" scrolling="no" frameborder="0" allowTransparency="true" style="border:none; overflow:hidden; width:85px; height:21px;"></iframe></div>
			<div style="float:left; width:80px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<g:plusone size="medium" href="http://sqlity.net/en/232/t-sql-tuesday-20-%e2%80%93-t-sql-best-practices/"></g:plusone>
			</div>
			<div style="float:left; width:95px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;">
			<a href="http://twitter.com/share" class="twitter-share-button" data-url="http://sqlity.net/en/232/t-sql-tuesday-20-%e2%80%93-t-sql-best-practices/"  data-text="T-SQL Tuesday #20 – T-SQL Best Practices" data-count="horizontal" data-via="sqlity"></a>
			</div>			
			<div style="float:left; width:85px;padding-right:10px; margin:4px 4px 4px 4px;height:30px;"><script src="http://www.stumbleupon.com/hostedbadge.php?s=1&amp;r=http://sqlity.net/en/232/t-sql-tuesday-20-%e2%80%93-t-sql-best-practices/"></script></div>			
			</div><div style="clear:both"></div><div style="padding-bottom:4px;"></div>]]></content:encoded>
			<wfw:commentRss>http://sqlity.net/en/232/t-sql-tuesday-20-%e2%80%93-t-sql-best-practices/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

