Statistics used in the cached execution plans – Stored Procedures
Summary
The query optimization process sometimes requires an understanding of how the SQL Server’s Query engine compiles, re-compiles, and executes SQL batches. Some of the most important elements used by the Query optimizer when constructing a good plan are the “Interesting statistics”. These are statistical information used by the Query optimizer when constructing a good enough query execution plan.
This blog attempts to explain what are the “interesting statistics”, when they are updated and how the statistical information relates to the query recompilation process. The topic is related to Temporary tables statistics when used in stored procedures.
Batch compilation and recompilation
To begin with, let’s analyze the batch compilation/recompilation diagram (By Arun Marathe, Jul 2004, Batch Compilation, Recompilation and Plan Caching Issues in SQL Server 2005). The idea is to create a set of experiments that will capture the behavior of a stored procedure through the different phases of the query compilation/recompilation process, particularly those related to the statistics that are used to generate the execution plan.
Figure 1, Batch Compilation/Recompilation diagram
I’ve used the AdventureWorks database to set up the test environment and MS Profiler to capture various Events relevant to the experiments.
MS Profiler events
-
- Attention (Errors and Warnings)
- Auto Stats (Performance)
- SP:CacheHit (Stored Procedures)
- SP:CacheInsert (Stored Procedures)
- SP:CacheMiss (Stored Procedures)
- SP:CacheRemove (Stored Procedures)
- SP:Completed (Stored Procedures)
- SP:Recompile (Stored Procedures)
- SP:Starting (Stored Procedures)
- RPC: Starting (Stored Procedures)*
- RPC:Completed (Stored Procedures)*
- SP:StmtStarting (Stored Procedures)
- SQL:StmtRecompile (TSQL)
- SQL:StmtStarting (TSQL)
Database Objects
Set AdventureWorks DB compatibility level to 140 – SQL Server 2017. The version provides easy access to the information about the interesting statistics saved with the query plan (SSMS – SELECT Plan Operator, Properties,OptimizerStatsUsage).
Below is the set of SQL Server object definitions used for the testing.
USE master GO ALTER DATABASE AdventureWorks SET COMPATIBILITY_LEVEL = 140; GO USE AdventureWorks GO /* dbo.SalesOrderDetail table */ DROP TABLE IF EXISTS dbo.SalesOrderDetail GO SELECT * INTO dbo.SalesOrderDetail FROM Sales.SalesOrderDetail; GO --add primary, clustered key ALTER TABLE dbo.SalesOrderDetail ADD CONSTRAINT PK_SalesOrderDetailID PRIMARY KEY CLUSTERED (SalesOrderDetailID); GO --NCI on ProductID CREATE NONCLUSTERED INDEX IX_SalesOrderDetail_ProductID ON dbo.SalesOrderDetail (ProductID); GO /* dbo.Products table */ DROP TABLE IF EXISTS dbo.Products GO SELECT * INTO dbo.Products FROM Production.Product GO --add primary, clustered key ALTER TABLE dbo.Products ADD CONSTRAINT PK_ProductID PRIMARY KEY CLUSTERED(ProductID) GO --NCI on ListPrice CREATE NONCLUSTERED INDEX NCI_Products_ListPrice ON dbo.Products (ListPrice) INCLUDE([Name],ProductNumber) GO /* dbo.TestQueryExectuion stored proc*/ DROP PROCEDURE IF EXISTS dbo.TestQueryExecution; GO CREATE PROCEDURE dbo.TestQueryExecution @ProductID INT ,@ProdName NVARCHAR(50) ,@MinLinePrice MONEY = $100 AS BEGIN SET XACT_ABORT ON; SET NOCOUNT ON --query1 SELECT d.CarrierTrackingNumber, d.OrderQty, d.UnitPrice, d.ProductID, p.[Name] FROM dbo.SalesOrderDetail d INNER JOIN dbo.Products p ON d.ProductID = p.ProductID WHERE d.ProductID = @ProductID --query2 SELECT [Name] ,ProductNumber ,ListPrice ,Color FROM dbo.Products WHERE ListPrice >= @MinLinePrice AND [Name] LIKE (@ProdName +'%') RETURN; END
Information about the statistics/indexes on the tables can be retrieved using the queries below.
USE AdventureWorks go --sys.stats SELECT TableName = OBJECT_NAME(sp.object_id) ,[Statistic ID] = sp.stats_id -- If statistics correspond to an index, the stats_id value is the same as the index_id value in the sys.indexes. ,[Statistics] = s.[name] ,[Last Updated] = sp.last_updated ,sp.[rows] ,sp.rows_sampled ,sp.unfiltered_rows ,Modifications = sp.modification_counter --- ,s.object_id FROM sys.stats AS s OUTER APPLY sys.dm_db_stats_properties (s.object_id,s.stats_id) AS sp WHERE s.object_id IN (OBJECT_ID(N'dbo.SalesOrderDetail'), (OBJECT_ID(N'dbo.Products'))); --sys.sysindexes SELECT TableName = OBJECT_NAME(i.id) ,IndexName = i.[name] ,i.indid --If statistics correspond to an index, the stats_id value in the sys.stats is the same as the index_id ,IndexDesc = CASE WHEN i.indid = 0 THEN 'HEAP' WHEN i.indid = 1 THEN 'CLUSTERED' WHEN i.indid > 1 THEN 'NONCLUSTERED' END ,i.rowcnt ,i.dpages ,i.rowmodctr FROM sys.sysindexes i WHERE i.id IN (OBJECT_ID(N'dbo.SalesOrderDetail'), (OBJECT_ID(N'dbo.Products')));
The following examples assume the default settings for the Sql Server’s options related to the statistics:
– AUTO_CREATE_STATISTICS ON
– AUTO_UPDATE_STATISTICS ON
– AUTO_UPDATE_STATISTICS_ASYNC OFF
A bit of theory first before proceeding with the tests. : )
colmodctr
colmodctr is an ever-increasing counter that tracks the changes made on tables (a counter per column excluding the non-persistent computed columns). colmodctr is not transactionally consistent which means that is not affected by the rolled-back changes i.e if a transaction inserts 10 rows in a table and then roll-back, the counter will still report 10 changes.
SQL Server Statistics (automatically/manually created/updated) on a column(s) will store the snapshot value of the colmodctr for the leftmost column in the stats blob.
The counter is very important since it’s one of the elements that drive the query recompilation decisions related to the statistics changed reasons.
colmodctr counter can be accessed through the following system views.
Figure 2, colmodctr, system views – standard and hidden
One way to access the hidden tables is to; Open a separate SSMS instance, close the object explorer, and create a single connection using the Server name: i.e ADMIN:(local)
NOTE: The structure of the hidden tables and the tables’ accessibility are not documented and may be changed in future versions.
Recompile thresholds (RT)
RT concept defines the number of changes on a table column needed to be done in order to indicate the statistical information of that column as stale.
The changes include the column values changes through the DML operations such as INSERT, UPDATE, DELETE… i.e Inserting 10 new rows in a table is considered as 10 changes(identified by the colmodctr counters mentioned before).
If the table does not have statistical information i. e HEAP table with no indexes and no manually created statistics, and the query plans that reference the table does not load/automatically create interesting statistics, the only relevant change when performing the RT crossing test will be the change in the number of rows inserted and/or deleted.
| colmodctr(current) – colmodctr(snapshot) | >= RT
or
| cardinality(current) – cardinality(snapshot) | >= RT
current – refers to the current value of the modification counter
snapshot – refers to the value of the mod. counter captured during the last plan compilation(re-compilation).
cardinality* – the number of rows in the table.
Cardinality
In mathematics, the cardinality of a set is defined as the number of elements in a SET. A SET is an unordered collection of elements in which each element is unique.
In RDBMS – see RDBMS fundamentals), data is presented in a form of a Table. An RDBMS table has its roots in a structure called Relation (attributes ~ columns, tuple ~ row). The number of tuples is represented by the cardinality of the relation – a Relation does not have duplicate tuples.
However, the table structure deviates from the strict rules and allows, e.g., duplicate rows. This means, that we use cardinality to represent the number of rows in a table, only if the table has no duplicates.
Sometimes in the literature, we find that the cardinality of a table represents the number of unique rows out of the total number of rows. So, cardinality in this context represents uniqueness.
Cardinality may represent the uniqueness of data values in a particular column – the lower the cardinality the selectivity of a value decreases, and the more duplicated values in the column. It is a measure that defines the density of a column – column density is a reciprocal value of the column’s cardinality. So the more selective values the less density. Then there is a metric called Density Vector that consists of the densities of the individual columns… super interesting stuff, but not for this post 🙂
In a different context, Cardinality is a way to define the relationship between two entities in a data model. It is also known as the degree of relationship i 1-1, 1-m, m-n.
The Threshold Crossing Test evaluates to TRUE if the number of changes is greater than the predefined RT value (see Figure 3)
Recompilation thresholds(RT) for all the tables referenced in the query are stored along with the query plan.
RT depends on the table type(permanent vs temporary) and the number of rows in the table.
Figure 3, Recompile thresholds
Special case. RT = 1 if the table has 0 rows (with or without statistics)
NOTE: Starting from SQL Server 2008 R2 SP1, Microsoft introduced TF2371. The trace flag activates the dynamic recompile threshold calculation. The higher number of rows in a table, the lower the RT. The functionality is implemented to allow automatic statistics updates to kick off more frequently for the big tables. i.e RT for a 10,000-row table is 500 + 0.20*10,000 = 2,500 – the number of changes required to trigger query recompile. For a table with 100M rows, the RT is 20,000,500. For some applications the RT may be too high, resulting in sub-optimal plans due to the lack of query recompilation. Hence the TF2371.
Starting from SQL Server 2016, the TF2371 is turned on by default.
Here are a couple of examples to illustrate Figure 3.
If there is a table A that contains 230 rows, RT for the table will be set to 500. This means that if we i.e insert 500 rows, the total number of rows (c) will change to 730 (c>=230+500) which is enough changes to make the table’s statistics stale.
The change itself does not mean much if there are no queries that reference the table : )
The query plans may or may not initiate the auto-statistics creation on the specific table columns. Also, the referenced tables may not have any statistical information i.e HEAP table with no non-clustered indexes.
Experiments
Experiment 1 (stats change before query execution)
In this experiment, we will make “enough” changes to the ListPrice column (dbo.Products table) BEFORE running the stored procedure for the first time,
The column is a key column in NCI_Products_ListPrice, the non-clustered index, and has statistical information attached to it (the stats object name is the same as the NCI)
Let’s begin the experiment by creating the test objects and checking the statistical information on the tables.
Step 1, Check the initial stats/rowmodctr information
Figure 4, Initial rowmodctr information
Step 2, Check stats BLOB and make changes on dbo.Products table
Run the DBCC command below before and after the UPDATE to confirm that there were no changes in the stats BLOB information.
DBCC SHOW_STATISTICS ('dbo.Products',NCI_Products_ListPrice) WITH STAT_HEADER
BEGIN TRANSACTION UPDATE dbo.Products --504 changes SET ListPrice +=ListPrice *0.10 --add 10% to the price UPDATE TOP(106) dbo.Products --106 changes SET ListPrice +=$10.00 -- add $10 dollars to the prices ROLLBACK TRANSACTION
NOTE: rowmodctr is not transactionally consistent.
Figure 5, stats BLOB information
Figure 6, rowmodctr after the initial dbo.Products update
The changes are detected and available through SQL Server’s metadata.
Step 3, Run the stored procedure and observe the captured events by the Profiler.
EXEC dbo.TestQueryExecution @ProductID =897 ,@ProdName = N'G' ,@MinLinePrice = $0
Following the batch compilation diagram, we can identify the following steps.
- The Cache Lookup step resulted in the SP:CasheMiss event. dbo.TestQueryExecution stored proc. does not exist in the cache.
- Query Compilation Begins. SQL Server engine is about to load all of the “interesting statistics”. The loaded statistics can be retrieved from the Actual Execution Plan, the SELECT physical operator – OptimiserStatsUsage property.
- The Query engine checks if any of the loaded interesting statistics are stale. If yes, the system stops the batch compilation process and refreshes the statistics. In our case the system has
- Identified the number of changes made on the ListPrice column. From the stats/index information gathered after the initial update, the number of changes (rowmodctr/Modifications) is 610.
- Performed RT crossing test. The test passed since the number of changes(610) exceeded the RT for tables with a number of rows greater than 500. RT = 500 + 0.20 * 504 ~ 601, 601 < 610
- Executed StatMan, an internal process that automatically maintains statistics. The process updated the stale statistics NCI_Products_ListPrice on dbo.Product table
SELECT StatMan([SC0], [SC1]) --from MS Profiler FROM (SELECT TOP 100 PERCENT [ListPrice] AS [SC0] ,[ProductID] AS [SC1] FROM [dbo].[Products] WITH (READUNCOMMITTED) ORDER BY [SC0],[SC1]) AS _MS_UPDSTATS_TBL OPTION (MAXDOP 16);
If we check the stats blob from Step 2, we will see that the Updated column changed its value to the current date – the stats blob has been updated.
The AutoStats event reported the UPDATE of the statistics with EventSubClass = 1 – Other. More on the event can be found here.
- Query Optimiser starts to generate the query plan – a plan for each query statement.
- The second query in the batch has a predicate on the Name column of the dbo.Products table. In an attempt to make better cardinality estimates on the rows that need to be processed, Query Optimiser decided to automatically create statistical information on the column.
The system stops the batch compilation process and again executes the StatsMan process to create the new statistics.
SELECT StatMan([SC0]) FROM ( SELECT TOP 100 PERCENT [Name] AS [SC0] FROM [dbo].[Products] WITH (READUNCOMMITTED) ORDER BY [SC0]) AS _MS_UPDSTATS_TBL OPTION (MAXDOP 16);
After creating the stats, QO decided not to use it : (
As a result of the updated statistics on the ListPrice column , the rowmodctr for the column was reset.
Below is the list of the “interesting statistics” loaded during the Query compilation process. The list does not include automatically created stats on the Name column. - QO sets the new recompilation thresholds(RT) for all tables used in the queries.
- RT(dbo. SalesOrderDetail) = 500 + 0.20(121317) =24763.4 (~24764)
- RT(dbo.Products) = 500 + 0.20(504)= 600.8(~601)
This means that QO will initiate query recompile due to the “Statistics changed” reason if- dbo. SalesOrderDetail
- 24764 or more inserted/deleted rows
- 24764 or more changes on SalesOrderDetailID, ProductID columns
- dbo.Products
- 601 or more inserted rows
- 601 or more changes on ProductID, ListPrice, and Name columns
- dbo. SalesOrderDetail
- The second query in the batch has a predicate on the Name column of the dbo.Products table. In an attempt to make better cardinality estimates on the rows that need to be processed, Query Optimiser decided to automatically create statistical information on the column.
- The query execution starts. The query plans are constructed and cached. SP:CacheInsert event reported that the stored procedure has been cached.
Experiment 2 (stats change during the query execution)
In this experiment, we will make “enough” changes to the Name column (dbo.Products table) HALFWAY THROUGH the stored procedure execution.
Step 1 Set up the environment
- Run the script to reset the test environment
- Add a WAITFOR statement between the two queries in the stored procedure
... SELECT d.CarrierTrackingNumber,--query1 d.OrderQty, d.UnitPrice, d.ProductID, p.[Name] FROM dbo.SalesOrderDetail d INNER JOIN dbo.Products p ON d.ProductID = p.ProductID WHERE d.ProductID = @ProductID WAITFOR DELAY '00:00:06' -- buy some time to change statistisc SELECT [Name]--query2 ,ProductNumber ,ListPrice ,Color FROM dbo.Products WHERE ListPrice >= @MinLinePrice AND [Name] LIKE (@ProdName +'%') ...
- Use PowerShell to execute the stored procedure. Add HostName property. Use the HostName to capture only the events related to the PS call. This will prevent MS Profiler from capturing events related to the UPDATE statement that will run in parallel.
PS C:\Users\dean.mincic> Invoke-Sqlcmd -ServerInstance "." ` -Database "AdventureWorks" ` -HostName "experiment" ` -Query "EXEC dbo.TestQueryExecution ` @ProductID =897 ` ,@ProdName = N'G' ` ,@MinLinePrice = 0.00;"
- Add an ApplicationName filter to the Profiler trace (ApplicationName LIKE experiment)
Step 2, Run the PowerShell cmdlet, switch to SSMS, and run the UPDATE query below. The queries will generate enough changes to make the automatically created statistics on the Name column stale.
BEGIN TRANSACTION UPDATE dbo.Products --504 changes (all rows) SET [Name] +='_ABC' UPDATE TOP(106) dbo.Products --106 changes (random rows) SET [Name] +='_ABC' COMMIT TRANSACTION
Step 3. Analyze the captured MS Profiler trace.
Figure 8, Query recompile
- The first thing that is different from the last run is the SP:CacheHit event. The event shows that our stored procedure was found in the Plan cache. The previously set RTs and the interesting statistics are part of the cached information.
NOTE: Auto-created statistic on the Name column was not used during the initial query compilation – the stats are not part of the interesting stats. - This time there were no changes on the columns that would initiate statistics updates, no new auto-created stats and the existing cached query plan does not need to be recompiled due to “statistic changed” reasons. The process proceeds with the query execution.
- The first query is successfully executed followed by the WAITFOR statement. During the statement execution (6 seconds delay) a separate query has made enough changes on the Name column(dbo.Products) to pass the RT crossing test for the table and flag the auto-created statistics on the column as stale. Even if not used by QO during the plan generation, the stats are marked as stale.
- (1) The query execution stops at the “Any stats stale?” step. The System initiates the query recompile process – SP: Recompile due to 2 – Statistics changed reason. The event is followed by the statement level SQL:StmtRecompile event which indicates that only the second query needs to be recompiled.
- (2) Again, the StatsMan process kicks in and updates the stale statistics. The RTs are set (in this case the number of rows has not changed, hence the RTs stayed the same). Rowmodctr value for the Name column is reset. to 0
- (3) The AutoStats event reported Statistics Update having EventSubClass = 1 – Other
- (4) The SP:StmtStarting event reports that the second query has been recompiled and the batch execution continues.
Experiment 3 (tables with no stats on columns)
The experiment demonstrates how queries get recompiled when referencing tables with no statistics. The recompiles due to the “statistics changed” reasons are initiated by the RT-table cardinality crossing test results only.
As previously mentioned, the cardinality-based RT crossing test is defined as
| cardinality(current) – cardinality(snapshot) | >= RT
Let’s create a test table and a stored procedure to perform the above experiment.
Step 1, set up the test environment
--a heap table - no statistics DROP TABLE IF EXISTS dbo.testRecomp; GO CREATE TABLE dbo.testRecomp(id INT ,filler CHAR(100) ); GO --test sp DROP PROCEDURE IF EXISTS dbo.testRecompile; GO CREATE PROCEDURE dbo.testRecompile AS BEGIN SET XACT_ABORT ON; SET NOCOUNT ON; SELECT TOP(100) Average = AVG(t.id) ,NoOfRows = COUNT(t1.id) FROM dbo.testRecomp t CROSS JOIN dbo.testRecomp t1 RETURN; END GO
.. and insert some data into the table…
--add 230 rows, RT will be set to 500 INSERT INTO dbo.testRecomp(id,filler) SELECT TOP(230) c.column_id,c.[name] FROM master.sys.columns c1 ,master.sys.columns c; GO
The initial statistical information looks like this (find how to retrieve the metadata related to the statistical information at the beginning of the post)
Figure 9, rowmodctr with no statistical information
Step 2 Run the stored proc for the first time. The RT is set to 500.
EXEC dbo.testRecompile;
Step 3 Make enough changes to the table to pass the cardinality crossing test. Insert 500 rows. Do not use explicit transaction yet.
--BEGIN TRANSACTION INSERT INTO dbo.testRecomp SELECT TOP(500)ABS(CHECKSUM(NEWID())) % 500, b.[name] + '_' + cast (ROW_NUMBER() OVER(ORDER BY (SELECT NULL )) AS varchar(5)) FROM sys.columns a, sys.columns b --COMMIT TRANSACTION; --ROLLBACK TRANSACTION;
Step 3 Run the stored procedure again and observe the query execution behavior in Profiler.
Figure 10, Query recompile, table cardinality change – no stats
- The new rowmodctr information looks like
The new number of rows (rowcnt) is recorded along with the number of changes, rowmodctr=730. In this case, the rowmodctr value is not relevant since the RT crossing test depends only on changes in the table cardinality. This will be more visible if we ROLLBACK the row insertion operation which will be covered later. - The second execution followed the “Cashe lookup = Success” path (see the batch compilation diagram) and failed to pass the very last step “Any stats stale?“.
- At this stage, the system has detected that the RT cardinality crossing test has passed due to the number of changes(new rows) inserted in the table.
- The system stopped the execution process and initiated the stored proc/statement recompile – SP:Recompile, SQL:StmtRecompile. As in the previous examples, the reason for the recompile was 2 – Statistics changed.
NOTE: The recompile process is not followed by the StatMan process since the query does not have any statsBlob information to be refreshed/created.
Experiment 3.1 (rowmodcnt not in use)
The next example shows that the RT cardinality crossing test is not related to rowmodctr as it may seem from the previous example where the number of changes followed table cardinality changes.
- Follow the steps from the previous example.
- Execute the INSERT query from Step 3 within an explicit transaction
BEGIN TRANSACTION INSERT INTO dbo.testRecomp SELECT TOP(500)ABS(CHECKSUM(NEWID())) % 500, b.[name] + '_' + cast (ROW_NUMBER() OVER(ORDER BY (SELECT NULL )) AS varchar(5)) FROM sys.columns a, sys.columns b ROLLBACK TRANSACTION;
- Observe that there are no query recompiles due to “statistic change“ since there were no table cardinality changes – the ROLLBACK “canceled” row insertions.
- The statistical information shows that the rowmodctr= 720.
Conclusion
Query compilation, execution, and recompilation sequence among other steps include; loading interesting statistics – the statistical information on different table columns that Query Optimiser may find useful when creating a good plan and auto-creating statistical information on the columns that participate in i.e WHERE filter, GROUP BY ..etc.
SQL Server query engine also checks the validity of the loaded statistical information during the initial stored procedure compilation and again during the stored procedure execution phase. If the loaded statistics are found to be stale, the former pauses stored procedure compilation, refreshes(re-samples/refreshes) the loaded statistical information, and continues the compilation process. If the Query engine detects stale loaded statistics during the execution phase, the process stops refreshes(re-samples/updates) statistics and restarts the compilation process – query recompilation. The re-compiles are done per query, not per batch.
The examples in this blog showed that the statistical information can be automatically maintained by the queries that use them. Statistics can be also maintained manually.
To mark statistics as “Stale”, QO uses the Recompile Threshold(RT) crossing test. The test tracks the number of changes on the significant(leftmost) columns within the statistic BLOBs. The information is stored in an ever-increasing, non-transactionally consistent counter – “rowmodctr”. The RTs are stored per table and within the compiled query.
The RT crossing test can be based only on the changes in the number of rows in a table.
Thanks for reading.
Dean Mincic