Conditional branching and OPTION(Recompile)

Conditional branching, OPTION(Recompile) and procedure plan


Summary

There are many cases when programmers use conditional branching in tsql code to execute different queries or similar queries with different predicates based on a certain condition. In this post I’ll try to explain how Query optimiser handle queries in different conditional branches and how it relates to the option(recompile) hint and the procedure plan.

Conditional branching in stored procedures

Our TSQL code may implement logic which use conditional branching to decide what business rule to apply. Take for example a simple, non-production process that selects all orders and their details associated with a productId. If the product is not included in any of the  sales Orders, the code returns nothing or a warning message.

Create test data

The script below creates a sample table with the following ProductId data distribution.


Figure 1, ProductId data distribution 

The figure above reads as follows i.e
ProductId = 0 participates in 100 Orders. The number of orders makes 0.1% of all orders. The same applies for ProductId 100,200,300 …4900, or 50 different ProductIds.
ProductId=40000 participates in 20,000 orders. The number of orders makes 20% of all orders. The same applies for ProductId 60000 and 80000, or 3 different ProductIds.

/*SQL Server 2019 CTP3, Compatibility level 150 */

--create a test table
DROP TABLE IF EXISTS dbo.TestBranchPlans
GO

CREATE TABLE dbo.TestBranchPlans(  
			OrderDetailId INT  IDENTITY(1,1)
				PRIMARY KEY CLUSTERED
            ,ProductId INT 
				INDEX NCI_ProductId
			,filler CHAR(10)							
			DEFAULT('abcdefghij')							   
)
GO

--insert test data
;WITH getNumbers AS
(
	SELECT TOP(100000) 
		rn = ROW_NUMBER() OVER(ORDER BY (c.[object_id])) -1 -- generate 100K records starting from 0
	FROM sys.all_columns c, sys.all_columns c1

),setDistribution AS 
(
	SELECT  rn
			,dstrb = CASE 
			          WHEN rn < 5000 THEN rn%100      -- selects values between 0 and 100 for the first 5000 rows. (0.1% per value)
					  WHEN rn < 10000 THEN rn % 500   --  -||-   ..  0 and 500 for the rows between 5000 and 10000 (0.5% per value)  
					  WHEN rn < 50000 THEN rn % 10000 --  -||-   ..  0 and 10K for the rows between 10K and 50K (10% per value)  
					  ELSE rn %20000                  --  -||-   ..  0 and 20K for the rows between 10K and 100K (20% per value)  
				  END    	   
	FROM  getNumbers
)

INSERT INTO dbo.TestBranchPlans WITH(TABLOCKX) (ProductId)
	SELECT  ProductId = CASE  --consolidate distributed values i.e make ProdId=200 100 times ..etc
							WHEN (dstrb = 0) THEN  rn 
							ELSE rn - dstrb 
						END
	FROM setDistribution 
GO

The script used to check data distribution …

;WITH x AS
(
		SELECT  NoOfProducts = COUNT(*)
			   ,ProductId
		FROM TestBranchPlans
		GROUP BY ProductId
)
SELECT [ProductId Distribution] = x.NoOfProducts
      ,[ProdId from] = MIN(ProductId)
	  ,[ProdId To] = MAX(ProductId)
	  ,NoOfDiffProducts = COUNT(x.ProductId)
	  ,[Percentage] = CAST((NoOfProducts/100000.00 ) * 100.00 AS DECIMAL(3,1))
FROM x
GROUP BY NoOfProducts

Test stored procedure

CREATE OR ALTER PROCEDURE dbo.TestCodeBranching
    @ProductId INT = NULL

AS
BEGIN
	SET XACT_ABORT ON;
	SET NOCOUNT ON;
     
	IF NOT EXISTS (SELECT ProductId
	               FROM  dbo.TestBranchPlans
				   WHERE ProductId = @ProductId
				   ) 
	BEGIN 
		--RAISERROR('There are no orders for the ProductID',10,1) WITH NOWAIT;
		RETURN;
	END 
		
	ELSE 
		SELECT OrderDetailId
              ,ProductId
			  ,filler
		FROM dbo.TestBranchPlans
		WHERE ProductId = @ProductId
    RETURN;
END
GO

Experiment 1
Proc. plan is generated for all branch paths

The first experiment shows that QO (query optimser) builds query plans for all code branches regardless of which one is executed on the very first sproc call. This is expected since procedure plan( a set of query execution plans) is generated and cached without knowing which code path will be chosen.

Execute stored proc without passing @ProductID parameter value. The parameter is an optional param and will have default value NULL.

EXEC dbo.TestCodeBranching 
GO 100

--check the plan
SELECT   DatabaseName = DB_NAME(x.dbid)
	    ,ObjectName = OBJECT_NAME(x.objectid)
		,c.cacheobjtype
		,c.objtype
        ,NoOfExecutions = c.usecounts
        ,x.query_plan
		,y.text
FROM    sys.dm_exec_cached_plans c
    CROSS APPLY sys.dm_exec_query_plan(plan_handle) x -- xml plan for the entire cache
	CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) y
WHERE c.objtype = 'Proc'
	AND y.[text] LIKE '%TestCodeBranching%'
GO

Notes:
GO 100 is to ensure that the plan stays in memory for some time. This is not needed for server level environments.
The 2nd query selects the procedure’s cashed plan(this is a set of estimated plans – no run-time values 🙂 ). In this example, the complete proc plan has two main query plans branched from T-SQL(COND WITH QUERY) operator.


Figure 2, Proc plan contains query plans for all code branches

The cached procedure plan shows that the second branch query plan is optimised by using the same parameter value (ProductId = NULL).

Note: The estimated number of rows is 1. Because the initial, NULL value is not represented by a RANGE_HI_KEY, Sql Server will use AVG_RANGE_ROWS value (stats histogram) instead. If we used i.e ProductId =2008 instead of NULL, the estimated number of rows would be 100 – use DBCC SHOW_STATISTICS(‘dbo.TestBranchPlans’ ,’NCI_ProductId’) to observe this behavior. 

The stored procedure call did not return any rows and the execution plan was built for @ProductionId = NULL. All subsequent calls may have sub-optimal plans as presented below.

DBCC FREEPROCCACHE
GO
EXEC dbo.TestCodeBranching -- 0 rows
GO
EXEC dbo.TestCodeBranching --0.1% 100 rows
	@ProductId = 200;
GO
EXEC dbo.TestCodeBranching  --0.5% 500 rows
	@ProductId = 5500;
GO
EXEC dbo.TestCodeBranching --10% 10000 rows
	@ProductId = 20000;
GO
EXEC dbo.TestCodeBranching --20% 20000 rows
	@ProductId = 60000;
GO

FIgure 2, Plan stability problem (Parameter sniffing problem)

*Accuracy[%] = (No of Actual Rows /  No. Of estimated rows ) * 100
This feature is available in SSMS 18.x+

Accurracy = 100%  – Ideal case, The estimated number of rows was spot on
Accurracy <100% – Overestimate. The estimated number of rows is higher than the actual no. of rows
Accurracy >100% – Underestimate . The estimated number of rows is lower than the actual number of rows.

Figure 2 shows negative effect of the cached, sub-optimal procedure plan, on the subsequent procedure calls.

Unstable procedure plan

Previous experiment showed how Sql Server builds query plans for all code paths without knowing which one will be executed. Query optimiser use the value passed into @ProductID parameter to create query plans for all queries in the batch that references it. In the test we called stored procedure without passing @ProductId, The absence of the value instructed the code to use parameter’s optional value, @ParameterId = NULL. As a consequence, QO used an atypical value to build and cache a sub-optimal query plan, which then had a negative impact on all subsequent procedure calls.
But the effect could be the same even if we passed any value to @ProductId.
Figure1 shows that the values in ProductId column are not evenly distributed (which is usually true for the production systems 🙂 ). However, most of the ProductIds, 76% (50 out of 66 different ProductIds) returns the same number of rows(100 rows). There is 15% ProductIds (10 out of 66) that returns 500rows and only 3% ProductIds (3 out of 66) that returns 10,000 and 20,000 rows.

Lets say that our procedure call pattern assumes similar probability of passing “small”, more selective*(returns only 100 rows)  and “big”, less selective(returns 20,000 rows) ProductId values.

*Selectivity represents uniqueness of values in a column. High selectivity = high uniqueness = low number of matching values. Selectivity = (rows that pass the predicate / total rows in the table). Selectivity[ProductId=200] = 100 / 100,000 =0.001(high selectivity)  and [ProductId = 40000] = 20000/100000 = 0.2 (low selectivity)

A cached procedure plan compiled for a “small” ProductId has a negative impact on the procedure executions with a “big” ProductId and vice versa.

DBCC FREEPROCCACHE
GO
SET STATISTICS XML ON 
	EXEC dbo.TestCodeBranching
		@ProductId = 200
SET STATISTICS XML OFF 
GO 
--check the plan
SELECT   DatabaseName = DB_NAME(x.dbid)
		,ObjectName = OBJECT_NAME(x.objectid)
		,c.cacheobjtype
		,c.objtype
		,NoOfExecutions = c.usecounts
		,x.query_plan
		,y.text
FROM    sys.dm_exec_cached_plans c
	CROSS APPLY sys.dm_exec_query_plan(plan_handle) x -- xml plan for the entire cache
	CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) y
WHERE   c.objtype = 'Proc'
	   AND y.[text] LIKE '%TestCodeBranching%'
GO
SET STATISTICS XML ON 
	EXEC dbo.TestCodeBranching
		@ProductId = 40000
SET STATISTICS XML OFF


Figure 3, Cached plan for a small ProductId

We would get the similar results if we passed a “big” ProductId first, and then made a procedure call passing a “small” value to the parameter, only this time the cached procedure plan would work in a favor of the “big” parameter.
This situation is known as “parameter sniffing” problem. The result is an unstable procedure plan.
One of a several different ways to resolve the problem is to instruct query processor to recompile the statement in question, on every procedure execution.

OPTION(RECOMPILE)

OPTION(RECOMPILE) is a statement level command that instructs query processor to pause batch execution, discard any stored query plans for the query, build a new plan, only now using the run-time values (parameters, local variables..), perform “constant folding” with passed in parameters.

Experiment 2
Conditional branching and OPTION(RECOMPILE)

In this experiment I’ll use OPTION(RECOMPILE) to stabilise the procedure plan. Lets repeat the last test, but this time we instruct query processor to recompile statement in question

...	ELSE 
		SELECT OrderDetailId
              ,ProductId
			  ,filler
		FROM dbo.TestBranchPlans
		WHERE ProductId = @ProductId
		OPTION(RECOMPILE)
..


Figure 4, Stable procedure plan with Option(Recompile)

Note: Using OPTION(recompile) to stabilise the plan comes with a certain cost. It adds a small overhead to the query compile time, can have some impact on CPU. It is a good idea, if possible, to re-write/decouple stored proc in order to prevent high variations in the procedure plans.
Personally, I’ve witnessed great results with the option(recompile) in the frequently executed stored procedures with the plan stability problem.

Where is my procedure plan?

In the next test we’ll run our stored procedure with the option(recompile), with a parameter value that does not match any existing ProductId value. This call will execute the first code branch and exit the batch.

DBCC FREEPROCCACHE
GO
	EXEC dbo.TestCodeBranching
		@ProductId = 721
GO 
--check the plan
SELECT   DatabaseName = DB_NAME(x.dbid)
		,ObjectName = OBJECT_NAME(x.objectid)
		,c.cacheobjtype
		,c.objtype
		,NoOfExecutions = c.usecounts
		,x.query_plan
		,y.text
FROM    sys.dm_exec_cached_plans c
	CROSS APPLY sys.dm_exec_query_plan(plan_handle) x -- xml plan for the entire cache
	CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) y
WHERE c.objtype = 'Proc'
	AND y.[text] LIKE '%TestCodeBranching%'
GO


Figure 5, Incomplete procedure plan 

So, now we need to answer question “why we are not getting our cached procedure plan (cached_plan is NULL)”.    🙂

Deferred Query compilation

When a client app executes stored procedure for the first time, query processor may decide not to create query plans for each individual query. This behavior is called Deferred Query Compilation. Some of the possible reasons are related to the conditional branching.
If the first call does not execute a code branch that contains at least one option(recompile) statement – there may be more than one statement in a code branch, the query plans for the branch will not be generated and cached. This makes procedure plan, as a set of individual query plans, incomplete.
Dynamic management function sys.dm_exec_query_plan(plan_handle) returns all individual query plans for the entire batch. If one or more query plans are not generated, the function returns NULL. This makes sense since we do not have complete procedure plan.
The following test demonstrate this behavior.

1. Create a new test stored proc dbo.TestCodeBranching1

CREATE OR ALTER PROCEDURE dbo.TestCodeBranching1
    @ProductId INT = NULL
AS
BEGIN
	SET XACT_ABORT ON;
	SET NOCOUNT ON;
     
	IF NOT EXISTS (SELECT ProductId
	               FROM  dbo.TestBranchPlans
				   WHERE ProductId = @ProductId
				   ) 
		RETURN;
	ELSE 
	BEGIN 
		SELECT OrderDetailId
              ,ProductId
			  ,filler
		FROM dbo.TestBranchPlans
		WHERE ProductId = @ProductId;

		SELECT 'hello there'
		OPTION(RECOMPILE);
/*
Test the same scenario using a temp table (replace the two queries above with the commented code
	    CREATE  TABLE #TEMP(IOrderDetailId INT
              ,ProductId INT
			  ,filler CHAR(10))

		INSERT INTO #TEMP 	
			SELECT OrderDetailId
				  ,ProductId
				  ,filler
			FROM dbo.TestBranchPlans
			WHERE ProductId = @ProductId
*/

			
		SELECT TOP(100) column_id, [name]
		FROM sys.columns 
		ORDER BY column_id DESC;	
	END 
    RETURN;
END
GO

2. Run the script below.

DBCC FREEPROCCACHE
GO
	EXEC dbo.TestCodeBranching1
		@ProductId = 40001
GO 

--check the plan
SELECT   DatabaseName = DB_NAME(x.dbid)
		,ObjectName = OBJECT_NAME(x.objectid)
		,c.cacheobjtype
		,c.objtype
		,NoOfExecutions = c.usecounts
		,x.query_plan
		,y.text
FROM    sys.dm_exec_cached_plans c
	CROSS APPLY sys.dm_exec_query_plan(plan_handle) x -- xml plan for the entire cache
	CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) y
WHERE c.objtype = 'Proc'
	AND y.[text] LIKE '%TestCodeBranching1%'
GO

--EXEC sp_describe_first_result_set 
SELECT  DatabaseName  = DB_NAME(s2.[dbid])
       ,ProcName      = OBJECT_NAME(s2.objectid, s2.[dbid])
	   ,sql_statement = SUBSTRING( s2.[text]
								   ,(s1.statement_start_offset / 2) + 1 --/2 2byutes per character S2.text is of type nvarchar(max). Start position is 0byte
			                       ,( 
										(CASE -- check for the end of a batch
											WHEN s1.statement_end_offset = -1 THEN  DATALENGTH(s2.[text]) --end of the last query in the batch  -1 represents the end of a batch
											ELSE s1.statement_end_offset
										 END) 
										- (statement_start_offset) / 2
									 ) + 1
						 )
		,query_plan = CAST(s3.query_plan AS XML) 
FROM    sys.dm_exec_query_stats AS s1
	CROSS APPLY sys.dm_exec_sql_text(s1.[sql_handle]) AS s2
		CROSS APPLY sys.dm_exec_text_query_plan(s1.plan_handle, s1.statement_start_offset, s1.statement_end_offset) s3 -- get the query plans
WHERE OBJECT_NAME(s2.objectid, s2.dbid) = 'TestCodeBranching1'
ORDER BY  s1.[sql_handle]
         ,s1.statement_start_offset
         ,s1.statement_end_offset

Results
Figure 6, Individual query plans

The sequence of events as follows:

  • The first branch got executed during the very first stored procedure call.
  • Query processor finds an Option(recompile) statement within  the second code branch and decides not to create execution plans for any of the queries in the code path.
  • Dynamic management fn, sys.dm_exec_query_plan(plan_handle) did not return cached procedure plan because the plan was incomplete.
  • Dynamic management function sys.dm_exec_text_query_plan(plan_handle, statement_start_offset, statement_end_offset) returned all available single query plans within the batch. In our case we have only one cached query plan. The plan belongs to the code path that was executed on the first call.

How the sys.dm_exec_text_query_plan query works?

The query collects data from the following objects:
sys.dm_exec_query_stats – gets various statistical information for cached query plans. We use sql_handle column (uniquely identifies the batch or stored procedure that the query is part of), plan_handle (uniquely identifies query execution plan for the queries that belongs to the batch), statement_start_offset, statement_end_offset ( define, in bytes, the starting and and ending position of a query within batch)
sys.dm_exec_sql_text – gets the text of the batch of the queries identified by sql_handle.  It also provides info about proc name, database etc..
-sys.dm_exec_text_query_plan  – returns execution plan for a batch of statements or for specific statement(s) in the batch. statement_start_offset(0 beginning of the batch) and statement_end_offset (-1 end of the batch) define the start and end position of the query within the batch defined by plan_handle.

Conclusion

Conditional branching as an imperative construct in TSQL has a specific treatment by Sql Server’s query processor. A procedure batch with conditional branching may be optimised and cached for all code paths regardless of which branch is executed on the first procedure call. This may produce sub-optimal plans for a number of queries within the non-executed code branches. It is important to be aware of the behavior in order to prevent potential sub-optimal query executions.
Sql Server may choose not to compile statements(deferred query compilation) within the non-executed code paths if at least one of the queries within the code paths has the OPTION(recompile) hint – this is also true for temp tables. This will make the procedure plan (as a set of query plans)  incomplete, hence sys.dm_exec_query_plan function returns NULL for the plan handle. However, queries from the executed code branch will be cached and the query plans will be available through sys.dm_exec_text_query_plan.

Thanks for reading.

Dean Mincic

Leave a Reply

Your email address will not be published. Required fields are marked *