I recently ran into a particularly odd behavior of SQL Server related to temporary tables. It took me a little digging to come up with an explanation for this, but I discovered something that most of us can learn from, so I decided to share it here.
Every query in SQL Server it is processed in three phases. First, the query is parsed. During that phase, the general syntax of the query is checked. The result of this phase is a parse tree.
The second phase is done by the Algebrizer. It is responsible to resolve object names and check that the query “makes sense”. For example the correct use of the GROUP BY clause is validated in this step. The output of this phase is called algebrized tree and it contains a representation of the query that is ready to be optimized. In other words, now SQL Server knows that the query can be executed (with some rare exceptions). But SQL Server does not yet know how to best execute the query.
That remaining question is answered in the third and most expensive phase, the optimization phase. During optimization SQL Server tries to find a good enough plan for the query. How query optimization works in detail is described very well by Paul White in his Query Optimizer Deep Dive.
Important for us here is, that the third phase, the one that gets the query ready to actually be executed, is resource intensive. To save resources, SQL Server caches the result of this phase for more complex queries and stored SQL modules. When the same query or stored procedure is executed the next time, the already prepared plan is found in the plan cache and reused, potentially saving a lot of CPU time.
Let us look at an example. The following is a simple stored procedure:
<br /> CREATE PROCEDURE dbo.UseATempTable<br /> AS<br /> BEGIN<br /> SET NOCOUNT ON;</p> <p> CREATE TABLE #ATempTable(Id INT);</p> <p> INSERT INTO #ATempTable(Id)<br /> VALUES(101),(123),(177);</p> <p> DECLARE @r INT<br /> SELECT @r = SUM(Id) FROM #ATempTable AS ATT;</p> <p> RETURN @r;<br /> END;<br />
When this procedure is executed the first time, an entry in the plan cache is created. To make this entry visible we need to access the sys.dm_exec_cached_plans DMV. I created the following view that returns the cache entries for just those procedures that are used in this and the following examples:
<br /> CREATE VIEW dbo.PlanCacheReuse<br /> AS<br /> SELECT<br /> OBJECT_NAME(CAST(DEPA.objectid AS INT),CAST(DEPA.dbid AS INT)) procedure_name,<br /> DECP.usecounts,<br /> DECP.cacheobjtype,<br /> DECP.objtype,<br /> DEPA.optional_spid<br /> FROM sys.dm_exec_cached_plans AS DECP<br /> CROSS APPLY (<br /> SELECT DEPAP.*<br /> FROM sys.dm_exec_plan_attributes(DECP.plan_handle) AS DEPAU<br /> PIVOT (MAX(DEPAU.value) FOR DEPAU.attribute IN (dbid,objectid,optional_spid)) AS DEPAP<br /> )DEPA<br /> WHERE DEPA.dbid = DB_ID()<br /> AND DEPA.objectid IN(<br /> OBJECT_ID(QUOTENAME(DB_NAME())+'.dbo.UseATempTable'),<br /> OBJECT_ID(QUOTENAME(DB_NAME())+'.dbo.UseAnExternalTempTable')<br /> );<br />
The output of the view looks like this, after the procedure dbo.UseATempTable has been executed for the first time:
If you execute the same procedure again (and again), the same cached plan is reused as you can see when you look at the usecount column:
Ideally, SQL Server has to compile each procedure only once and can use the cached plan for all subsequent executions. However, a few things can prevent re-use of a cached plan. The most obvious reason is that the plan had to be evicted from cache because the memory was needed for more frequently used objects. Another common cause is changing data. When more data is added to a database table, execution plans of queries accessing that table might become sub-optimal. SQL Server is relying on a few simple (and not so simple) heuristics to trigger a recompilation for cached plans that have the potential of being sub-optimal.
There is one reason preventing plan reuse that few people know about: The use of external temporary tables. When you create a temporary table in SQL Server and then call a stored procedure from the same connection, that temporary table is visible and accessible within the procedure. That pattern is sometimes used to pass data back and forth between a procedure and the calling procedure. For more details on the scope of temporary tables, check out Temp Tables – Scoping and Eclipsing.
When a stored procedure is compiled that is accessing an external temp table, SQL Server has no guarantee that the next time this stored procedure is called it is called from the same connection. However, if it is called from a different connection, the accessed temp table might contain significantly more (or less) data making a different execution plan preferable.
A simple way to deal with this situation is to force a recompilation every time a procedure that works with external temporary tables is executed. SQL Server is not going that route. Instead, SQL Server caches the procedure once for each connection. That can safe a significant amount of CPU resources when the procedure in question is called within a loop. Let us take a look at this procedure:
<br /> CREATE PROCEDURE dbo.UseAnExternalTempTable<br /> AS<br /> BEGIN<br /> SET NOCOUNT ON;</p> <p> INSERT INTO #ATempTable(Id)<br /> VALUES(101),(123),(177);</p> <p> DECLARE @r INT<br /> SELECT @r = SUM(Id) FROM #ATempTable AS ATT;</p> <p> RETURN @r;<br /> END;<br />
This procedure looks similar to the one we looked at before, but it expects the temp table #ATempTable to exist when it is called. When you execute it a few times you see the same behavior that we have encountered before: A plan is created and cached after the first execution,
and then reused in subsequent executions.
But you might have noticed that the SPID of the current connection is stored with the plan in the optional_spid column. So, if you now execute the same procedure from a different connection, a new plan is created and cached
and then reused as long as you are calling again from that SPID.
The behavior described above can save a lot of resources in some situations, but it can also cause resources to be wasted. This is due to the eclipsing behavior described in Temp Tables – Scoping and Eclipsing.
When a procedure is accessing a temp table and a temp table with the same name already exists in the calling context, SQL Server does not know at optimization time whether that temp table will be the one accessed or if it will have been eclipsed at the time. Therefore, SQL Server assumes that the existing external temp table is the one that is accessed and optimizes the procedure for that one. This could cause a suboptimal plan to be chosen. But more importantly, that plan will be cached only for the connection it is called from. Let us look at this procedure:
<br /> CREATE PROCEDURE dbo.AlsoUseATempTable<br /> AS<br /> BEGIN<br /> SET NOCOUNT ON;</p> <p> CREATE TABLE #ATempTable(Id INT);</p> <p> EXEC dbo.UseATempTable;<br /> END;<br />
It creates a temp table and then calls our first example procedure dbo.UseATempTable. Even though that one is not going to access the pre-existing temp table, its execution plan is tied to the executing SPID:
So when you call the outer procedure again from another connection, a new plan for the inner procedure will be cached:
On busy machines, SPIDs above 500 are not uncommon. So, if you have an often executed procedure pair like this you can end up with 500 or more copies of the execution plan. While a single execution plan is not huge, it might still take up more than a megabyte in the cache. Having to store 500 of them instead of just one can cause significant memory pressure on the cache, which will cause other queries to be evicted from the cache early. This will in turn cause additional work for SQL Server as it has to recompile those evicted queries again. That additional work can have a measurable impact on your SQL Server's performance.
The above scenario is more common than you might think. I often encounter temporary table that have a - let's say - common name. A typical example is the name #tmp. Single letters like #o are also often used. If you use temp table names like this, and you have a procedure that calls another one, you might be in trouble.
However, there is a simple way to prevent this from happening (in most cases): Use good names. There are many reasons why an object name like #OrdersInDateRange is better than #tmp. Making the code easier to read and understand and therefore reducing maintenance cost is just one of them. Reducing the probability of unexpected plan caching issues due to name collisions is another one.
If you use descriptive names for your temporary tables will make life simpler, yours and SQL Server's. Using not so descriptive names on the other hand can hurt you in expected and in unexpected ways.