Conditional branching, OPTION(Recompile) and procedure plan
There are many cases when programmers use conditional branching in tsql code to execute different queries or similar queries with different predicates based on a certain condition. In this post I’ll try to explain how Query optimiser handle queries in different conditional branches and how it relates to the option(recompile) hint and the procedure plan.
Conditional branching in stored procedures
Our TSQL code may implement logic which use conditional branching to decide what business rule to apply. Take for example a simple, non-production process that selects all orders and their details associated with a productId. If the product is not included in any of the sales Orders, the code returns nothing or a warning message.
Create test data
The script below creates a sample table with the following ProductId data distribution.
The figure above reads as follows i.e
ProductId = 0 participates in 100 Orders. The number of orders makes 0.1% of all orders. The same applies for ProductId 100,200,300 …4900, or 50 different ProductIds.
ProductId=40000 participates in 20,000 orders. The number of orders makes 20% of all orders. The same applies for ProductId 60000 and 80000, or 3 different ProductIds.
/*SQL Server 2019 CTP3, Compatibility level 150 */ --create a test table DROP TABLE IF EXISTS dbo.TestBranchPlans GO CREATE TABLE dbo.TestBranchPlans( OrderDetailId INT IDENTITY(1,1) PRIMARY KEY CLUSTERED ,ProductId INT INDEX NCI_ProductId ,filler CHAR(10) DEFAULT('abcdefghij') ) GO --insert test data ;WITH getNumbers AS ( SELECT TOP(100000) rn = ROW_NUMBER() OVER(ORDER BY (c.[object_id])) -1 -- generate 100K records starting from 0 FROM sys.all_columns c, sys.all_columns c1 ),setDistribution AS ( SELECT rn ,dstrb = CASE WHEN rn < 5000 THEN rn%100 -- selects values between 0 and 100 for the first 5000 rows. (0.1% per value) WHEN rn < 10000 THEN rn % 500 -- -||- .. 0 and 500 for the rows between 5000 and 10000 (0.5% per value) WHEN rn < 50000 THEN rn % 10000 -- -||- .. 0 and 10K for the rows between 10K and 50K (10% per value) ELSE rn %20000 -- -||- .. 0 and 20K for the rows between 10K and 100K (20% per value) END FROM getNumbers ) INSERT INTO dbo.TestBranchPlans WITH(TABLOCKX) (ProductId) SELECT ProductId = CASE --consolidate distributed values i.e make ProdId=200 100 times ..etc WHEN (dstrb = 0) THEN rn ELSE rn - dstrb END FROM setDistribution GO
The script used to check data distribution …
;WITH x AS ( SELECT NoOfProducts = COUNT(*) ,ProductId FROM TestBranchPlans GROUP BY ProductId ) SELECT [ProductId Distribution] = x.NoOfProducts ,[ProdId from] = MIN(ProductId) ,[ProdId To] = MAX(ProductId) ,NoOfDiffProducts = COUNT(x.ProductId) ,[Percentage] = CAST((NoOfProducts/100000.00 ) * 100.00 AS DECIMAL(3,1)) FROM x GROUP BY NoOfProducts
Test stored procedure
CREATE OR ALTER PROCEDURE dbo.TestCodeBranching @ProductId INT = NULL AS BEGIN SET XACT_ABORT ON; SET NOCOUNT ON; IF NOT EXISTS (SELECT ProductId FROM dbo.TestBranchPlans WHERE ProductId = @ProductId ) BEGIN --RAISERROR('There are no orders for the ProductID',10,1) WITH NOWAIT; RETURN; END ELSE SELECT OrderDetailId ,ProductId ,filler FROM dbo.TestBranchPlans WHERE ProductId = @ProductId RETURN; END GO
Proc. plan is generated for all branch paths
The first experiment shows that QO (query optimser) builds query plans for all code branches regardless of which one is executed on the very first sproc call. This is expected since procedure plan( a set of query execution plans) is generated and cached without knowing which code path will be chosen.
Execute stored proc without passing @ProductID parameter value. The parameter is an optional param and will have default value NULL.
EXEC dbo.TestCodeBranching GO 100 --check the plan SELECT DatabaseName = DB_NAME(x.dbid) ,ObjectName = OBJECT_NAME(x.objectid) ,c.cacheobjtype ,c.objtype ,NoOfExecutions = c.usecounts ,x.query_plan ,y.text FROM sys.dm_exec_cached_plans c CROSS APPLY sys.dm_exec_query_plan(plan_handle) x -- xml plan for the entire cache CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) y WHERE c.objtype = 'Proc' AND y.[text] LIKE '%TestCodeBranching%' GO
GO 100 is to ensure that the plan stays in memory for some time. This is not needed for server level environments.
The 2nd query selects the procedure’s cashed plan(this is a set of estimated plans – no run-time values 🙂 ). In this example, the complete proc plan has two main query plans branched from T-SQL(COND WITH QUERY) operator.
The cached procedure plan shows that the second branch query plan is optimised by using the same parameter value (ProductId = NULL).
Note: The estimated number of rows is 1. Because the initial, NULL value is not represented by a RANGE_HI_KEY, Sql Server will use AVG_RANGE_ROWS value (stats histogram) instead. If we used i.e ProductId =2008 instead of NULL, the estimated number of rows would be 100 – use DBCC SHOW_STATISTICS(‘dbo.TestBranchPlans’ ,’NCI_ProductId’) to observe this behavior.
The stored procedure call did not return any rows and the execution plan was built for @ProductionId = NULL. All subsequent calls may have sub-optimal plans as presented below.
DBCC FREEPROCCACHE GO EXEC dbo.TestCodeBranching -- 0 rows GO EXEC dbo.TestCodeBranching --0.1% 100 rows @ProductId = 200; GO EXEC dbo.TestCodeBranching --0.5% 500 rows @ProductId = 5500; GO EXEC dbo.TestCodeBranching --10% 10000 rows @ProductId = 20000; GO EXEC dbo.TestCodeBranching --20% 20000 rows @ProductId = 60000; GO
*Accuracy[%] = (No of Actual Rows / No. Of estimated rows ) * 100
This feature is available in SSMS 18.x+
Accurracy = 100% – Ideal case, The estimated number of rows was spot on
Accurracy <100% – Overestimate. The estimated number of rows is higher than the actual no. of rows
Accurracy >100% – Underestimate . The estimated number of rows is lower than the actual number of rows.
Figure 2 shows negative effect of the cached, sub-optimal procedure plan, on the subsequent procedure calls.
Unstable procedure plan
Previous experiment showed how Sql Server builds query plans for all code paths without knowing which one will be executed. Query optimiser use the value passed into @ProductID parameter to create query plans for all queries in the batch that references it. In the test we called stored procedure without passing @ProductId, The absence of the value instructed the code to use parameter’s optional value, @ParameterId = NULL. As a consequence, QO used an atypical value to build and cache a sub-optimal query plan, which then had a negative impact on all subsequent procedure calls.
But the effect could be the same even if we passed any value to @ProductId.
Figure1 shows that the values in ProductId column are not evenly distributed (which is usually true for the production systems 🙂 ). However, most of the ProductIds, 76% (50 out of 66 different ProductIds) returns the same number of rows(100 rows). There is 15% ProductIds (10 out of 66) that returns 500rows and only 3% ProductIds (3 out of 66) that returns 10,000 and 20,000 rows.
Lets say that our procedure call pattern assumes similar probability of passing “small”, more selective*(returns only 100 rows) and “big”, less selective(returns 20,000 rows) ProductId values.
*Selectivity represents uniqueness of values in a column. High selectivity = high uniqueness = low number of matching values. Selectivity = (rows that pass the predicate / total rows in the table). Selectivity[ProductId=200] = 100 / 100,000 =0.001(high selectivity) and [ProductId = 40000] = 20000/100000 = 0.2 (low selectivity)
A cached procedure plan compiled for a “small” ProductId has a negative impact on the procedure executions with a “big” ProductId and vice versa.
DBCC FREEPROCCACHE GO SET STATISTICS XML ON EXEC dbo.TestCodeBranching @ProductId = 200 SET STATISTICS XML OFF GO --check the plan SELECT DatabaseName = DB_NAME(x.dbid) ,ObjectName = OBJECT_NAME(x.objectid) ,c.cacheobjtype ,c.objtype ,NoOfExecutions = c.usecounts ,x.query_plan ,y.text FROM sys.dm_exec_cached_plans c CROSS APPLY sys.dm_exec_query_plan(plan_handle) x -- xml plan for the entire cache CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) y WHERE c.objtype = 'Proc' AND y.[text] LIKE '%TestCodeBranching%' GO SET STATISTICS XML ON EXEC dbo.TestCodeBranching @ProductId = 40000 SET STATISTICS XML OFF
We would get the similar results if we passed a “big” ProductId first, and then made a procedure call passing a “small” value to the parameter, only this time the cached procedure plan would work in a favor of the “big” parameter.
This situation is known as “parameter sniffing” problem. The result is an unstable procedure plan.
One of a several different ways to resolve the problem is to instruct query processor to recompile the statement in question, on every procedure execution.
OPTION(RECOMPILE) is a statement level command that instructs query processor to pause batch execution, discard any stored query plans for the query, build a new plan, only now using the run-time values (parameters, local variables..), perform “constant folding” with passed in parameters.
Conditional branching and OPTION(RECOMPILE)
In this experiment I’ll use OPTION(RECOMPILE) to stabilise the procedure plan. Lets repeat the last test, but this time we instruct query processor to recompile statement in question
... ELSE SELECT OrderDetailId ,ProductId ,filler FROM dbo.TestBranchPlans WHERE ProductId = @ProductId OPTION(RECOMPILE) ..
Figure 4, Stable procedure plan with Option(Recompile)
Note: Using OPTION(recompile) to stabilise the plan comes with a certain cost. It adds a small overhead to the query compile time, can have some impact on CPU. It is a good idea, if possible, to re-write/decouple stored proc in order to prevent high variations in the procedure plans.
Personally, I’ve witnessed great results with the option(recompile) in the frequently executed stored procedures with the plan stability problem.
Where is my procedure plan?
In the next test we’ll run our stored procedure with the option(recompile), with a parameter value that does not match any existing ProductId value. This call will execute the first code branch and exit the batch.
DBCC FREEPROCCACHE GO EXEC dbo.TestCodeBranching @ProductId = 721 GO --check the plan SELECT DatabaseName = DB_NAME(x.dbid) ,ObjectName = OBJECT_NAME(x.objectid) ,c.cacheobjtype ,c.objtype ,NoOfExecutions = c.usecounts ,x.query_plan ,y.text FROM sys.dm_exec_cached_plans c CROSS APPLY sys.dm_exec_query_plan(plan_handle) x -- xml plan for the entire cache CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) y WHERE c.objtype = 'Proc' AND y.[text] LIKE '%TestCodeBranching%' GO
So, now we need to answer question “why we are not getting our cached procedure plan (cached_plan is NULL)”. 🙂
Deferred Query compilation
When a client app executes stored procedure for the first time, query processor may decide not to create query plans for each individual query. This behavior is called Deferred Query Compilation. Some of the possible reasons are related to the conditional branching.
If the first call does not execute a code branch that contains at least one option(recompile) statement – there may be more than one statement in a code branch, the query plans for the branch will not be generated and cached. This makes procedure plan, as a set of individual query plans, incomplete.
Dynamic management function sys.dm_exec_query_plan(plan_handle) returns all individual query plans for the entire batch. If one or more query plans are not generated, the function returns NULL. This makes sense since we do not have complete procedure plan.
The following test demonstrate this behavior.
1. Create a new test stored proc dbo.TestCodeBranching1
CREATE OR ALTER PROCEDURE dbo.TestCodeBranching1 @ProductId INT = NULL AS BEGIN SET XACT_ABORT ON; SET NOCOUNT ON; IF NOT EXISTS (SELECT ProductId FROM dbo.TestBranchPlans WHERE ProductId = @ProductId ) RETURN; ELSE BEGIN SELECT OrderDetailId ,ProductId ,filler FROM dbo.TestBranchPlans WHERE ProductId = @ProductId; SELECT 'hello there' OPTION(RECOMPILE); /* Test the same scenario using a temp table (replace the two queries above with the commented code CREATE TABLE #TEMP(IOrderDetailId INT ,ProductId INT ,filler CHAR(10)) INSERT INTO #TEMP SELECT OrderDetailId ,ProductId ,filler FROM dbo.TestBranchPlans WHERE ProductId = @ProductId */ SELECT TOP(100) column_id, [name] FROM sys.columns ORDER BY column_id DESC; END RETURN; END GO
2. Run the script below.
DBCC FREEPROCCACHE GO EXEC dbo.TestCodeBranching1 @ProductId = 40001 GO --check the plan SELECT DatabaseName = DB_NAME(x.dbid) ,ObjectName = OBJECT_NAME(x.objectid) ,c.cacheobjtype ,c.objtype ,NoOfExecutions = c.usecounts ,x.query_plan ,y.text FROM sys.dm_exec_cached_plans c CROSS APPLY sys.dm_exec_query_plan(plan_handle) x -- xml plan for the entire cache CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) y WHERE c.objtype = 'Proc' AND y.[text] LIKE '%TestCodeBranching1%' GO --EXEC sp_describe_first_result_set SELECT DatabaseName = DB_NAME(s2.[dbid]) ,ProcName = OBJECT_NAME(s2.objectid, s2.[dbid]) ,sql_statement = SUBSTRING( s2.[text] ,(s1.statement_start_offset / 2) + 1 --/2 2byutes per character S2.text is of type nvarchar(max). Start position is 0byte ,( (CASE -- check for the end of a batch WHEN s1.statement_end_offset = -1 THEN DATALENGTH(s2.[text]) --end of the last query in the batch -1 represents the end of a batch ELSE s1.statement_end_offset END) - (statement_start_offset) / 2 ) + 1 ) ,query_plan = CAST(s3.query_plan AS XML) FROM sys.dm_exec_query_stats AS s1 CROSS APPLY sys.dm_exec_sql_text(s1.[sql_handle]) AS s2 CROSS APPLY sys.dm_exec_text_query_plan(s1.plan_handle, s1.statement_start_offset, s1.statement_end_offset) s3 -- get the query plans WHERE OBJECT_NAME(s2.objectid, s2.dbid) = 'TestCodeBranching1' ORDER BY s1.[sql_handle] ,s1.statement_start_offset ,s1.statement_end_offset
Figure 6, Individual query plans
The sequence of events as follows:
- The first branch got executed during the very first stored procedure call.
- Query processor finds an Option(recompile) statement within the second code branch and decides not to create execution plans for any of the queries in the code path.
- Dynamic management fn, sys.dm_exec_query_plan(plan_handle) did not return cached procedure plan because the plan was incomplete.
- Dynamic management function sys.dm_exec_text_query_plan(plan_handle, statement_start_offset, statement_end_offset) returned all available single query plans within the batch. In our case we have only one cached query plan. The plan belongs to the code path that was executed on the first call.
How the sys.dm_exec_text_query_plan query works?
The query collects data from the following objects:
– sys.dm_exec_query_stats – gets various statistical information for cached query plans. We use sql_handle column (uniquely identifies the batch or stored procedure that the query is part of), plan_handle (uniquely identifies query execution plan for the queries that belongs to the batch), statement_start_offset, statement_end_offset ( define, in bytes, the starting and and ending position of a query within batch)
– sys.dm_exec_sql_text – gets the text of the batch of the queries identified by sql_handle. It also provides info about proc name, database etc..
-sys.dm_exec_text_query_plan – returns execution plan for a batch of statements or for specific statement(s) in the batch. statement_start_offset(0 beginning of the batch) and statement_end_offset (-1 end of the batch) define the start and end position of the query within the batch defined by plan_handle.
Conditional branching as an imperative construct in TSQL has a specific treatment by Sql Server’s query processor. A procedure batch with conditional branching may be optimised and cached for all code paths regardless of which branch is executed on the first procedure call. This may produce sub-optimal plans for a number of queries within the non-executed code branches. It is important to be aware of the behavior in order to prevent potential sub-optimal query executions.
Sql Server may choose not to compile statements(deferred query compilation) within the non-executed code paths if at least one of the queries within the code paths has the OPTION(recompile) hint – this is also true for temp tables. This will make the procedure plan (as a set of query plans) incomplete, hence sys.dm_exec_query_plan function returns NULL for the plan handle. However, queries from the executed code branch will be cached and the query plans will be available through sys.dm_exec_text_query_plan.
Thanks for reading.