Conditional branching, OPTION(Recompile), and procedure plan
Summary
There are many cases when programmers use conditional branching in TSQL code to execute different queries or similar queries with different predicates based on a certain condition. In this post, I’ll try to explain how Query Optimizer handles queries in different conditional branches and how it relates to the option(recompile) hint and the procedure plan.
Conditional branching in stored procedures
Our TSQL code may implement logic that uses conditional branching to decide what business rule to apply. Take for example a simple, non-production process that selects all orders and their details associated with a productId. If the product is not included in any of the sales Orders, the code returns nothing or a warning message.
Create test data
The script below creates a sample table with the following ProductId data distribution.
Figure 1, ProductId data distribution
The figure above reads as follows i.e
ProductId = 0 participates in 100 Orders. The number of orders makes up 0.1% of all orders. The same applies to ProductId 100,200,300 …4900, or 50 different ProductIds.
ProductId=40000 participates in 20,000 orders. The number of orders makes up 20% of all orders. The same applies to ProductId 60000 and 80000, or 3 different ProductIds.
/*SQL Server 2019 CTP3, Compatibility level 150 */ --create a test table DROP TABLE IF EXISTS dbo.TestBranchPlans GO CREATE TABLE dbo.TestBranchPlans( OrderDetailId INT IDENTITY(1,1) RIMARY KEY CLUSTERED ,ProductId INT INDEX NCI_ProductId ,filler CHAR(10) DEFAULT('abcdefghij') ) GO --insert test data ;WITH getNumbers AS ( SELECT TOP(100000) -- generate 100K records starting from 0 rn = ROW_NUMBER() OVER(ORDER BY (c.[object_id])) -1 FROM sys.all_columns c, sys.all_columns c1 ),setDistribution AS ( SELECT rn ,CASE -- values 0 to 100 for the first 5000 rows. (0.1% per value) WHEN rn < 5000 THEN rn%100 -- values 0 to 500 for the rows between 5000 and 10000 (0.5% per value) WHEN rn < 10000 THEN rn % 500 -- values 0 to 10K for the rows between 10K and 50K (10% per value) WHEN rn < 50000 THEN rn % 10000 -- values 0 to 20K for the rows between 10K and 100K (20% per value) ELSE rn %20000 END as dstrb FROM getNumbers ) INSERT INTO dbo.TestBranchPlans WITH(TABLOCKX) (ProductId) SELECT CASE --consolidate distributed values i.e make ProdId=200 100 times ..etc WHEN (dstrb = 0) THEN rn ELSE rn - dstrb END as ProductId FROM setDistribution GO
The script used to check data distribution …
;WITH x AS ( SELECT NoOfProducts = COUNT(*) ProductId FROM TestBranchPlans GROUP BY ProductId ) SELECT [ProductId Distribution] = x.NoOfProducts ,[ProdId from] = MIN(ProductId) ,[ProdId To] = MAX(ProductId) ,NoOfDiffProducts = COUNT(x.ProductId) ,[Percentage] = CAST((NoOfProducts/100000.00 ) * 100.00 AS DECIMAL(3,1)) FROM x GROUP BY NoOfProducts
Test stored procedure
CREATE OR ALTER PROCEDURE dbo.TestCodeBranching @ProductId INT = NULL AS BEGIN SET XACT_ABORT ON; SET NOCOUNT ON; IF NOT EXISTS (SELECT ProductId FROM dbo.TestBranchPlans WHERE ProductId = @ProductId) BEGIN --RAISERROR('There are no orders for the ProductID',10,1) WITH NOWAIT; RETURN; END ELSE SELECT OrderDetailId ,ProductId ,filler FROM dbo.TestBranchPlans WHERE ProductId = @ProductId RETURN; END GO
Experiment 1
Proc. plan is generated for all branch paths
The first experiment shows that QO (query optimizer) builds query plans for all code branches regardless of which one is executed on the very first sproc call. This is expected since the procedure plan( a set of query execution plans) is generated and cached without knowing which code path will be chosen.
Execute stored proc without passing @ProductID parameter value. The parameter is an optional parameter and will have a default value NULL.
EXEC dbo.TestCodeBranching GO 100 --check the plan SELECT DatabaseName = DB_NAME(x.dbid) ,ObjectName = OBJECT_NAME(x.objectid) ,c.cacheobjtype ,c.objtype ,NoOfExecutions = c.usecounts ,x.query_plan ,y.text FROM sys.dm_exec_cached_plans c CROSS APPLY sys.dm_exec_query_plan(plan_handle) x -- xml plan for the entire cache CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) y WHERE c.objtype = 'Proc' AND y.[text] LIKE '%TestCodeBranching%' GO
Notes:
GO 100 is to ensure that the plan stays in memory for some time. This is not needed for server-level environments.
The 2nd query selects the procedure’s cashed plan(this is a set of estimated plans – no run-time values 🙂 ). In this example, the complete proc plan has two main query plans branched from T-SQL(COND WITH QUERY) operator.
Figure 2, Proc plan contains query plans for all code branches
The cached procedure plan shows that the second branch query plan is optimized by using the same parameter value (ProductId = NULL).
Note: The estimated number of rows is 1. Because the initial, NULL value is not represented by a RANGE_HI_KEY, SQL Server will use AVG_RANGE_ROWS value (stats histogram) instead. If we used i.e. ProductId =2008 instead of NULL, the estimated number of rows would be 100 – use DBCC SHOW_STATISTICS(‘dbo.TestBranchPlans’, ‘NCI_ProductId’) to observe this behavior.
The stored procedure call did not return any rows and the execution plan was built for @ProductionId = NULL. All subsequent calls may have sub-optimal plans as presented below.
DBCC FREEPROCCACHE GO EXEC dbo.TestCodeBranching -- 0 rows GO EXEC dbo.TestCodeBranching --0.1% 100 rows @ProductId = 200; GO EXEC dbo.TestCodeBranching --0.5% 500 rows @ProductId = 5500; GO EXEC dbo.TestCodeBranching --10% 10000 rows @ProductId = 20000; GO EXEC dbo.TestCodeBranching --20% 20000 rows @ProductId = 60000; GO
Figure 2, Plan stability problem (Parameter sniffing problem)
*Accuracy[%] = (No of Actual Rows / No. Of estimated rows ) * 100
This feature is available in SSMS 18.x+
Accuracy = 100% – Ideal case, The estimated number of rows was spot on
Accuracy <100% – Overestimate. The estimated number of rows is higher than the actual no. of rows
Accuracy>100% – Underestimate. The estimated number of rows is lower than the actual number of rows.
Figure 2 shows the negative effect of the cached, sub-optimal procedure plan, on the subsequent procedure calls.
Unstable procedure plan
The previous experiment showed how SQL Server builds query plans for all code paths without knowing which one will be executed. The query optimizer uses the value passed into the @ProductID parameter to create query plans for all queries in the batch that references it. In the test we called the stored procedure without passing @ProductId, The absence of the value instructed the code to use the parameter’s optional value, @ParameterId = NULL. As a consequence, QO used an atypical value to build and cache a sub-optimal query plan, which then had a negative impact on all subsequent procedure calls.
But the effect could be the same even if we passed any value to @ProductId.
Figure 1 shows that the values in the ProductId column are not evenly distributed (which is usually true for the production systems 🙂 ). However, most of the ProductIds, 76% (50 out of 66 different ProductIds) return the same number of rows(100 rows). There are 15% ProductIds (10 out of 66) that return 500 rows and only 3% ProductIds (3 out of 66) that return 10,000 and 20,000 rows.
Let’s say that our procedure call pattern assumes a similar probability of passing “small”, more selective*(returns only 100 rows), and “big”, less selective(returns 20,000 rows) ProductId values.
*Selectivity represents the uniqueness of values in a column. High selectivity = high uniqueness = low number of matching values. Selectivity = (rows that pass the predicate / total rows in the table). Selectivity[ProductId=200] = 100 / 100,000 =0.001(high selectivity) and [ProductId = 40000] = 20000/100000 = 0.2 (low selectivity)
A cached procedure plan compiled for a “small” ProductId has a negative impact on the procedure executions with a “big” ProductId and vice versa.
DBCC FREEPROCCACHE GO SET STATISTICS XML ON EXEC dbo.TestCodeBranching @ProductId = 200 SET STATISTICS XML OFF GO --check the plan SELECT DatabaseName = DB_NAME(x.dbid) ,ObjectName = OBJECT_NAME(x.objectid) ,c.cacheobjtype ,c.objtype ,NoOfExecutions = c.usecounts ,x.query_plan ,y.text FROM sys.dm_exec_cached_plans c CROSS APPLY sys.dm_exec_query_plan(plan_handle) x -- xml plan for the entire cache CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) y WHERE c.objtype = 'Proc' AND y.[text] LIKE '%TestCodeBranching%' GO SET STATISTICS XML ON EXEC dbo.TestCodeBranching @ProductId = 40000 SET STATISTICS XML OFF
Figure 3, Cached plan for a small ProductId
We would get similar results if we passed a “big” ProductId first, and then made a procedure call passing a “small” value to the parameter, only this time the cached procedure plan would work in favor of the “big” parameter.
This situation is known as the “parameter sniffing” problem. The result is an unstable procedure plan.
One of several different ways to resolve the problem is to instruct the query processor to recompile the statement in question, on every procedure execution.
OPTION(RECOMPILE)
OPTION(RECOMPILE) is a statement-level command that instructs the query processor to pause batch execution, discard any stored query plans for the query, build a new plan, only now using the run-time values (parameters, local variables..), perform “constant folding” with passed in parameters.
Experiment 2
Conditional branching and OPTION(RECOMPILE)
In this experiment, I’ll use OPTION(RECOMPILE) to stabilize the procedure plan. Let’s repeat the last test, but this time we instruct the query processor to recompile the statement in question
... ELSE SELECT OrderDetailId ,ProductId ,filler FROM dbo.TestBranchPlans WHERE ProductId = @ProductId OPTION(RECOMPILE) ..
Figure 4, Stable procedure plan with Option(Recompile)
Note: Using OPTION(recompile) to stabilize the plan comes with a certain cost. It adds a small overhead to the query compile time and can have some impact on the CPU. It is a good idea, if possible, to re-write/decouple stored proc in order to prevent high variations in the procedure plans.
Personally, I’ve witnessed great results with the option(recompile) in the frequently executed stored procedures with the plan stability problem.
Where is my procedure plan?
In the next test, we’ll run our stored procedure with the option(recompile), with a parameter value that does not match any existing ProductId value. This call will execute the first code branch and exit the batch.
DBCC FREEPROCCACHE GO EXEC dbo.TestCodeBranching @ProductId = 721 GO --check the plan SELECT DatabaseName = DB_NAME(x.dbid) ,ObjectName = OBJECT_NAME(x.objectid) ,c.cacheobjtype ,c.objtype ,NoOfExecutions = c.usecounts ,x.query_plan ,y.text FROM sys.dm_exec_cached_plans c CROSS APPLY sys.dm_exec_query_plan(plan_handle) x -- xml plan for the entire cache CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) y WHERE c.objtype = 'Proc' AND y.[text] LIKE '%TestCodeBranching%' GO
Figure 5, Incomplete procedure plan
So, now we need to answer the question “Why we are not getting our cached procedure plan (cached_plan is NULL)”. 🙂
Deferred Query compilation
When a client app executes a stored procedure for the first time, the query processor may decide not to create query plans for each individual query. This behavior is called Deferred Query Compilation. Some of the possible reasons are related to the conditional branching.
If the first call does not execute a code branch that contains at least one option(recompile) statement – there may be more than one statement in a code branch, the query plans for the branch will not be generated and cached. This makes the procedure plan, as a set of individual query plans, incomplete.
Dynamic management function sys.dm_exec_query_plan(plan_handle) returns all individual query plans for the entire batch. If one or more query plans are not generated, the function returns NULL. This makes sense since we do not have a complete procedure plan.
The following test demonstrates this behavior.
1. Create a new test stored proc dbo.TestCodeBranching1
CREATE OR ALTER PROCEDURE dbo.TestCodeBranching1 @ProductId INT = NULL AS BEGIN SET XACT_ABORT ON; SET NOCOUNT ON; IF NOT EXISTS (SELECT ProductId FROM dbo.TestBranchPlans WHERE ProductId = @ProductId) RETURN; ELSE BEGIN SELECT OrderDetailId ,ProductId ,filler FROM dbo.TestBranchPlans WHERE ProductId = @ProductId; SELECT 'hello there' OPTION(RECOMPILE); /* Test the same scenario using a temp table (replace the two queries above with the commented code CREATE TABLE #TEMP(IOrderDetailId INT ,ProductId INT ,filler CHAR(10)) INSERT INTO #TEMP SELECT OrderDetailId ,ProductId ,filler FROM dbo.TestBranchPlans WHERE ProductId = @ProductId */ SELECT TOP(100) column_id, [name] FROM sys.columns ORDER BY column_id DESC; END RETURN; END GO
2. Run the script below.
DBCC FREEPROCCACHE GO EXEC dbo.TestCodeBranching1 @ProductId = 40001 GO --check the plan SELECT DatabaseName = DB_NAME(x.dbid) ,ObjectName = OBJECT_NAME(x.objectid) ,c.cacheobjtype ,c.objtype ,NoOfExecutions = c.usecounts ,x.query_plan ,y.text FROM sys.dm_exec_cached_plans c CROSS APPLY sys.dm_exec_query_plan(plan_handle) x -- xml plan for the entire cache CROSS APPLY sys.dm_exec_sql_text(c.plan_handle) y WHERE c.objtype = 'Proc' AND y.[text] LIKE '%TestCodeBranching1%' GO --EXEC sp_describe_first_result_set SELECT DatabaseName = DB_NAME(s2.[dbid]) ,ProcName = OBJECT_NAME(s2.objectid, s2.[dbid]) ,sql_statement = SUBSTRING( s2.[text] ,(s1.statement_start_offset / 2) + 1 --/2 2bytes per character S2.text is of type nvarchar(max). Start position is 0byte ,( (CASE -- check for the end of a batch WHEN s1.statement_end_offset = -1 THEN DATALENGTH(s2.[text]) --end of the last query in the batch -1 represents the end of a batch ELSE s1.statement_end_offset END) - (statement_start_offset) / 2 ) + 1 ) ,query_plan = CAST(s3.query_plan AS XML) FROM sys.dm_exec_query_stats AS s1 CROSS APPLY sys.dm_exec_sql_text(s1.[sql_handle]) AS s2 CROSS APPLY sys.dm_exec_text_query_plan(s1.plan_handle, s1.statement_start_offset, s1.statement_end_offset) s3 -- get the query plans WHERE OBJECT_NAME(s2.objectid, s2.dbid) = 'TestCodeBranching1' ORDER BY s1.[sql_handle] ,s1.statement_start_offset ,s1.statement_end_offset
Results
Figure 6, Individual query plans
The sequence of events is as follows:
- The first branch was executed during the very first stored procedure call.
- The query processor finds an Option(recompile) statement within the second code branch and decides not to create execution plans for any of the queries in the code path.
- Dynamic management fn, sys.dm_exec_query_plan(plan_handle) did not return the cached procedure plan because the plan was incomplete.
- Dynamic management function sys.dm_exec_text_query_plan(plan_handle, statement_start_offset, statement_end_offset) returned all available single query plans within the batch. In our case, we have only one cached query plan. The plan belongs to the code path that was executed on the first call.
How does the sys.dm_exec_text_query_plan query work?
The query collects data from the following objects:
– sys.dm_exec_query_stats – gets various statistical information for cached query plans. We use the sql_handle column (uniquely identifies the batch or stored procedure that the query is part of), plan_handle (uniquely identifies query execution plan for the queries that belong to the batch), statement_start_offset, statement_end_offset ( define, in bytes, the starting and ending position of a query within a batch)
– sys.dm_exec_sql_text – gets the text of the batch of the queries identified by sql_handle. It also provides info about the proc name, database, etc…
-sys.dm_exec_text_query_plan – returns execution plan for a batch of statements or for specific statement(s) in the batch. statement_start_offset(0 beginning of the batch) and statement_end_offset (-1 end of the batch) define the start and end position of the query within the batch defined by plan_handle.
Conclusion
Conditional branching as an imperative construct in TSQL has a specific treatment by SQL Server’s query processor. A procedure batch with conditional branching may be optimized and cached for all code paths regardless of which branch is executed on the first procedure call. This may produce sub-optimal plans for a number of queries within the non-executed code branches. It is important to be aware of the behavior in order to prevent potential sub-optimal query executions.
Sql Server may choose not to compile statements(deferred query compilation) within the non-executed code paths if at least one of the queries within the code paths has the OPTION(recompile) hint – this is also true for temp tables. This will make the procedure plan (as a set of query plans) incomplete, hence sys.dm_exec_query_plan function returns NULL for the plan handle. However, queries from the executed code branch will be cached and the query plans will be available through sys.dm_exec_text_query_plan.
Thanks for reading.
Dean Mincic