Client Requests & Sql events in Sql Server
Summary
Very often we, the sequel guys, use MS Profiler/Sql Server’s Extended events to intercept and analyse the traffic between client applications and our precious Sql Server. Sometimes we capture an hour/day of traffic, and sometimes we just want to capture a slow running stored procedure’s input param values. Also, we are involved in the data layer code reviews in order to keep Client data requests as optimal as possible. This post is a sort of an internal notes/documentation about different types of requests that can be sent from Clients to Sql Server and the corresponding events visible in MS Profiler. The article also gives a high overview on the space between ADO.NET and Sql Server
MDAC – Microsoft Data Access Components
To start off, lets first take a brief look at the MDAC framework and where ADO.NET sits in the stack. MDAC technology allows applications to communicate with almost any data-store.
MDAC architecture implements three logical layers:
- Programming interface layer (ADO, ADO.NET)
- DB access layer (ODBC, OLEDB, .NET Managed providers)
- Data store
All of the layers are accessible through the MDAC API. (+ MS Sql Server network library)*
NOTE: Microsoft Network Library(Net-lib) is also part of MDAC and sits between TDS(Tabular data stream) Formatter and the network protocol. The DLLs are specifically used by SQL Server to communicate with the Clients on the network level. It supports the network protocols: Named Pipes, TCP/IP, Shared Memory, VIA).
ADO.NET
ADO.NET provides a generic interface(set of classes that expose data access services) to a number of different data-stores (SQL Server,CSV, Excel..) from .NET applications. It is a set of libraries included in MS .NET Framework.
The main objectives of ADO.NET is to;
- connect to a data source
- submit queries
- process results
- provide a powerful disconnected data structure to perform data operations offline.
There are two main components of ADO.NET for accessing and manipulating data
-
- .NET Framework data providers – a set of class libraries that understand how to interact with specific data platform e.g Sql Server, or with a common data layer i.e OLE DB. Data providers transport data between proprietary data platforms and the generic ADO.NET data layer. ADO.NET framework includes three providers
- The MS Sql Server Provider – exposed through the System.Data.SqlClient namespace
- The OLE DB provider – exposed through the System.Data.OleDb namespace.
- The ODBC provider – exposed through the System.Data.Odbc namespace.
- DataSet – a class(System.Data namespace) that provides a disconnected representation of result sets from the Data Source. The object provides a consistent relational programming model regardless of the data source. It includes objects like: DataTable, DataRow, DataView, DataColumn, PrimaryKey ..etc
- .NET Framework data providers – a set of class libraries that understand how to interact with specific data platform e.g Sql Server, or with a common data layer i.e OLE DB. Data providers transport data between proprietary data platforms and the generic ADO.NET data layer. ADO.NET framework includes three providers
The key classes within each provider(each provider has its own specific class names for the same objects) include:
- Command (SqlCommand),
- Connection (SqlConnection),
- DataAdapter (SqlDataAdapter)*
- DataReader (SqlDataReader).
Figure 2, ADO.NET components, high overview
TDS (Tabular data stream)
TDS is an application layer protocol used to transfer tsql command between client and server. The protocol includes facilities for; authentication and identification, encrypted negotiation, issuing of Sql batch(language events) and stored procedures(RPC) calls, managing and distributing the outcome of atomic transactions. TDS describe the names, types and optional descriptions of the rows being returned. It was developed by Sysbase in 1984 and adopted by Microsoft in 1990. MS-TDS(Microsoft’s version of the protocol) specification can be found here.
Figure 3,Communication flow in TDS
More on how the Client talks to Sql Server
Client communicates with Sql Server in a layered fashion. Each layer exchanges information with its neighboring layer. Sometimes the layers can be bypassed. Only the Transport protocol layer enables physical communication between Client and Sql Server. For all other layers the exchange of information is done locally – in RAM memory and between different dlls.
The layers are:
- Application Layer (C# code, etc)
- High level data access API (ADO,ADO.NET…)
- Client data interface(OLEDB,ODBC, db-lib ..)
- Client TDS Formatter. The tsql commands sent to Sql server has to be in TDS format.
- Client net protocol (Shared memory, TCP/IP sockets, net-lib)
- Client transport protocol (TCP/IP ..)
- Server transport protocol (TCP/IP ..)
- Server net protocol (Shared memory, TCP/IP sockets, net-lib)
- Server data interface
- SQL Server
NOTES: App level(A) can bypass ADO.NET level(B) if we directly access teh Client data interface (direct API calls to OLE-DB, ODBC..etc).
Client protocol(E) and Client transport protocol(F) can be bypassed if the Client and Sql Server communicate through the Shared memory protocol. In this scenario Client and SQL Server run on the same machine. The components exchange information through the RAM memory space. The network elements are not required.
Client data requests
The only way clients can interact with Sql Server is by sending requests that contain TSQL commands for the database engine. The requests are sent over MS-TDS protocol(Microsoft’s version of TDS).
There are three main forms of the requests
- Batch requests
- RPC (Remote Procedure Requests)
- Bulk Load Request
In the following examples I’ll use ADO.NET components to present the different types of Client requests. I’ll use MS Profiler to capture and analyse Events which corresponds to the requests.
Events
When Sql Server receives a message from a client(through TDS), it can be thought of as an event occurring – see Figure 3, First client sends a connection request(a login event) and gets back success or failure response. Once connected, client requests typically fall into one of two categories:
- Language events
- RPC events
Language Events
A language event is a tsql batch sent from the client to the server. From ADO.NET perspective a tsql batch* is a set of tsql commands defined in the CommandText property of the SqlCommand Class. The command can be executed using one of the commands; ExecuteReader, ExecuteScalar, ExecuteNonQuery and ExectueXMLReader.
The commandType enum property of the SqlCommand Class specifies how the command string is interpreted. The Field values can be:
- CommandType.StoredProcedure (contains the name of a stored procedure or a scalar function)
- CommandType.Text (contains an tsql batch – one or more tsql statements, no GO directives)
- CommandType.TableDirect (contains a table name. Only supported by the .NET Framework Data Provider for OLD DB)
NOTE: A tsql batch can be defined as one or more tsql statements before the GO* directive(end of batch signal). This is specific to Microsoft SQL stack tools i.e SSMS, sqlcmd, osql. isql- Sybase It is not part of tSql language and may not be recognised by other similar tools. It is not permitted to use GO directive within CommandText property.
After a TDS request reaches SQL Server, the db engine will create a TASK to handle the request. One task handles only one batch(a task may spawn a number of sub-tasks to support parallel executions). When ADO.NET “executes” a batch(one or more tsql commands), one TASK will handle the request. When SSMS executes i.e two batches(the same tsql command twice – GO 2, or two tsql statements divided by the GO terminator), Sql Server engine(SQLOS) allocates two separate TASKS for the job.
Non-parameterised batch request
This type of request does not have parameters. It can contain local variable declarations. The request is treated by Sql Server engine as an ad-hoc query.
Adhoc queries can be cached and reused only if a subsequent batch matches exactly – exact textual match. i.e
SELECT * FROM dbo.Products WHERE [name]='bread'; --and SELECT * FROM dbo.Products WHERE [name]='butter';
The query plans above are compiled and cashed separately. This behavior can lead to the problem known as “the cache pollution”.
Sql server optimiser may decide to automatically parameterise an ad-hoc query, but it’s very conservative when making those decisions. The query template must be considered to be safe, meaning that the same query plan doesn’t change even if the actual parameter values change – the paramterisation must not degrade query performances. Also, query engine decides on the auto-parameter data type (smallint in the example below). The datatype may be changed due to change in the input parameter size(PurshaseOrderId = 3420 can fit in 2bytes-smallint, but PurshaseOrderId = 210 fits into 1byte-tinyint). In that case the optimiser cashes another copy of the plan that includes the new data type.
In addition, many query elements disallow automatic parameterisation, such as; JOIN, IN, INTO, DISTINCT, TOP.. just to name a few.
--this query may be autoparameterised since it uses the same plan regardless of the --predicate's value (PurchaseOrderID is a clustered primary key) USE AdventureWorks2012 GO SELECT PurchaseOrderID ,[Status] ,OrderDate ,SubTotal FROM Purchasing.PurchaseOrderHeader WHERE PurchaseOrderID = 3420; --843, --1 .. --the plan template looks like /* (@1smallint)SELECT[PurchaseOrderID],[Status],[OrderDate],[SubTotal] FROM [Purchasing].[PurchaseOrderHeader] WHERE [PurchaseOrderID]=@1 */ -- the query below uses a predicate with an uneven distribution of values. This means that the -- plan shape depends on the predicate value and therefore the auto parameterisation would degrade -- the query's performance USE AdventureWorks2012 GO SELECT ProductID ,PurchaseOrderID ,OrderQty FROM Purchasing.PurchaseOrderDetail WHERE ProductID = 707 --319 --391
Let’s execute the second query through a simple console application. The predicate value will be passed as a part of the batch. Pass 707 as a command line argument; (ProductId = 707)
NOTE: We can also use SSMS to run the batch.
using System; using System.Data; using System.Data.SqlClient; namespace TestClientRequest_batch { class Program { static void Main(string[] args) { int ProductId = 0; //get arguments if (args.Length != 0) { ProductId = Convert.ToInt16(args[0]); } using (SqlConnection conn = new SqlConnection()) { conn.ConnectionString = @"Data Source=tcp:(local); Initial Catalog=AdventureWorks2012; Integrated Security=True; Application Name=TestClientRequests"; using (SqlCommand cmd = new SqlCommand()) { cmd.Connection = conn; cmd.CommandType = CommandType.Text; cmd.CommandText = @"SELECT ProductID ,PurchaseOrderID ,OrderQty FROM Purchasing.PurchaseOrderDetail WHERE ProductID= " + ProductId.ToString(); conn.Open(); //use sql reader to execute the query using (SqlDataReader dr = cmd.ExecuteReader()) { Console.WriteLine("ProductID PurchaseOrderID OrderQty\t"); while (dr.Read()) { Console.WriteLine(dr[0].ToString() + "\t " + dr[1].ToString() + "\t\t\t" + dr[2].ToString()); } } } Console.ReadKey(); //prevent auto close } } } }
Figure 4, MS Profiler – tsql batch execution events
The Client request is executed as a tsql batch. SQL:BatchStarting signals that the batch execution is starting. SQL:StmtStarting event was fired for each statement in the batch(once in this example).
RPC Events
There are two basic ADO.NET client request types which invoke RPC(Remote Procedure Call) events
- Parameterised batch request – uses sys.sp_executesql Sql Server’s system extended stored proc
- Stored procedure and scalar user defined function call
NOTE: RPC(Remote Procedure Call) name came from Sybase and in context of the Client request represents a type of formatting of the query inside a TDS packet before it is sent to Sql Server. The requests are pre-formated by the driver/provider i.e OLEDB, ODBC,JDBC. ..etc
Windows RPC(Remote Procedure Call) is completely different technology which implements the interprocess communication(IPC) that allows applications to talk to each other locally or over the network. More on the RPC here.
Parameterised batch request
Parameterised type of request does have parameter values passed to tSQL statements. Parameter input is treated as a literal value, not as an executable code.
This type of request significantly improves the way Sql Server process the tsql commands. The sql batch separates the “body” of the query from the data values managed by that query.
Lets change SqlCommand object from the previous example to use a parameterised tsql statement and see what happens.
... using (SqlCommand cmd = new SqlCommand()) { cmd.Connection = conn; cmd.CommandType = CommandType.Text; cmd.CommandText = @"SELECT ProductID ,PurchaseOrderID ,OrderQty FROM Purchasing.PurchaseOrderDetail WHERE ProductID= @ProductId"; SqlParameter ProdId = new SqlParameter("@ProductId", SqlDbType.Int); cmd.Parameters.Add(ProdId).Value = ProductId; conn.Open(); ...
Figure 5, MS Profiler – parameterised batch RPC event
This time ADO.NET constructs the request as a RPC(Remote Procedure Call) type. The client sends an RPC message data stream(TDS protocol level) to Sql server. The message contains: Procedure name( or in this case the ProcID) and the Parameters. ProcID identifies Sql Server’s internal sp to be executed. In the example above ProcID = 10. The id value maps sys.sp_exeuctesql extended stored procedure in Sql Server.
Figure 6, sys.sp_executesql extended stored procedure
NOTE: We cannot use SSMS to invoke RPC event types since it’s not possible to send Parameters as a part of the TDS binary stream.
The parameterised batch request results with a cached plan, just like a regular stored procedure call.
SELECT cp.usecounts ,cp.cacheobjtype ,cp.objtype ,txt.[text] ,qp.query_plan FROM sys.dm_exec_cached_plans cp CROSS APPLY sys.dm_exec_sql_text(plan_handle) txt CROSS APPLY sys.dm_exec_query_plan(plan_handle) qp WHERE cacheobjtype = 'Compiled Plan' AND [text] NOT LIKE '%dm_exec_cached_plans%' AND [text] LIKE '%FROM Purchasing.PurchaseOrderDetail%'
Figure 7, parameterised batch request – cached plan
Stored procedure call
This type of RPC request is similar to the parameterised batch request. The main difference is that the commandType enum property of the SqlCommand Class is set to “StoredProcedure“, and the CommandText property value represents the name* of the stored procedure.
Quick note: In order to avoid cache misses/compile locks, use the two part naming convention when referencing sql objects – dbo.uspGetPurchaseOrderDetails. More about sp caching can be found here.
Let’s encapsulate previous tsql batch in a stored proc…
USE AdventureWorks2012; GO CREATE PROCEDURE dbo.uspGetPurchaseOrderDetails @ProductId int AS BEGIN SET NOCOUNT ON; SELECT ProductID ,PurchaseOrderID ,OrderQty FROM Purchasing.PurchaseOrderDetail WHERE ProductID= @ProductId; RETURN; END
and then modify CommandType and CommandText properties as follows…
..... using (SqlCommand cmd = new SqlCommand()) { cmd.Connection = conn; cmd.CommandType = CommandType.StoredProcedure; cmd.CommandText = @"dbo.uspGetPurchaseOrderDetails"; SqlParameter ProdId = new SqlParameter("@ProductId", SqlDbType.Int); cmd.Parameters.Add(ProdId).Value = ProductId; conn.Open(); ...
Figure 8, MS Profiler – RPC Stored proc call
Using RPC: Starting and/or SP:Starting events we can find out the value of the stored procedure’s input parameter(s). It is interesting to notice the absence of the stored procedure’s ObjectId within the RPC call. The initial request doesn’t know whatID (object_id) is used by Sql Server to identify the stored procedure.
SELECT o.[name] ,o.[object_id] ,spdef.[definition] FROM sys.objects o INNER JOIN sys.all_sql_modules spdef ON spdef.[object_id] = o.[object_id] WHERE o.[name] = N'uspGetPurchaseOrderDetails' /* --result name ObjectId Definition ----------------------------------------------------- uspGetPurchaseOrderDetails 1364199910 CREATE PROC.. */
Not directly related to the topic but nevertheless interesting observation is the SP:CacheMiss event. The event describes the failed cache lookup for the stored procedure on the very first run as the plan was not generated and stored in the cache.
Now, let’s execute a stored procedure that has a nested sp.
--------------------- top level stored proc -------------- USE AdventureWorks2012 GO ALTER PROCEDURE dbo.uspGetPurchaseOrderDetails @ProductId int ,@ProdName dbo.[Name] OUTPUT --NVARCHAR(50) AS BEGIN SET NOCOUNT ON; SELECT ProductID ,PurchaseOrderID ,OrderQty FROM Purchasing.PurchaseOrderDetail WHERE ProductID= @ProductId --get the product name EXEC dbo.uspGetProductName @ProductId ,@ProductName = @ProdName OUTPUT RETURN; END go --------------------- nested stored proc -------------- CREATE PROCEDURE dbo.uspGetProductName @ProductId INT ,@ProductName dbo.[Name] OUTPUT --NVARCHAR(50) AS BEGIN SET NOCOUNT ON; IF @ProductId IS NULL RETURN; SELECT @ProductName = [Name] FROM Production.Product WHERE ProductID = @ProductId; RETURN; END -----------------------------------------------------
… add an output parameter to the code ..
... using (SqlCommand cmd = new SqlCommand()) { cmd.Connection = conn; cmd.CommandType = CommandType.StoredProcedure; cmd.CommandText = @"dbo.uspGetPurchaseOrderDetails"; SqlParameter ProdId = new SqlParameter("@ProductId", SqlDbType.Int); SqlParameter ProdName = new SqlParameter("@ProdName", SqlDbType.NVarChar,-1); // -1 varchar max ProdName.Direction = ParameterDirection.Output; cmd.Parameters.Add(ProdId).Value = ProductId; cmd.Parameters.Add(ProdName); conn.Open(); //use sql reader to execute the query using (SqlDataReader dr = cmd.ExecuteReader()) { Console.WriteLine("ProductID PurchaseOrderID OrderQty\t"); while (dr.Read()) { Console.WriteLine(dr[0].ToString() + "\t " + dr[1].ToString() + "\t\t" + dr[2].ToString()); } } //data reader object must be closed before retrieving the output parameter Console.WriteLine(ProdName.Value.ToString()); } ....
From the Profiler’s trace below we can see that the only one RPC request was initiated for the top level stored procedure.
Figure 9, RPC request and nested stored procedures
From the trace above we can observe a few interesting things
- Only the first, top level stored procedure call is initiated through the RPC event. This is the request initiated by ADO.NET.
- The nested stored procedure call was captured through SP:Starting and SP:StmtStarting events. The input parameter values are not visible(more about this later).
- RPC Output parameter* event is triggered AFTER the RPC:Completed event. This explains why we need to close Data reader object before retrieving the output parameter(see the c# code above)
NOTE: When the RPC is invoked, some or all of its parameters are designated as output parameters. All output parameters will have values returned from the server. For each output parameter, there is a corresponding return value, sent via the RETURNVALUE token. More about this can be found here.
How to capture nested stored procedure parameter values
In the example below, the nested stored procedure call was captured through the SP:Starting and SP:StmtStarting events. it is not possible to capture nested procedure(s) input parameter values only by examining this two events. Sql Profiler is not a debugging tool and it shows only what has been executed. This means that it can show only the parameters values that are initially passed from the data layer, not during the procedure execution.
So how can we capture a nested stored procedure input parameter values? Well, we can get the values if
- The top level stored proc parameter(which we can get through the RPC event) value is passed unchanged to the nested sp.
- The nested stored proc parameters are passed as literals i.e .. EXECUTE dbo.MyStoredProc @name=’John Doe’, @Age=42 ..
However, there is another, more interesting way to capture the parameter values.
We can include the Showplan XML Statistic Profile event class to our trace. The event class triggers when Sql Server executes an SQL statement and shows complete, compile-time data, including the parameter values we are interested in.
Figure 10, Showplan XML Statistic Profile event class
We can export the plan(XML format) and then search for the ParameterList element.
<ColumnReference Column="@ProductName" ParameterDataType="nvarchar(50)" ParameterCompiledValue="NULL" ParameterRuntimeValue="N'HL Crankarm'"> </ColumnReference> <ColumnReference Column="@ProductId" ParameterDataType="int" ParameterCompiledValue="(319)" ParameterRuntimeValue="(319)"> </ColumnReference>
Now we can analyse the parameters’ runtime/compiled values and the hunt for the parameter sniffing issues can begin 🙂
NOTE: Showplan XML Statistic Profile is an expensive event to track. It may be a good idea to use the server side tracking instead or the Profiler UI. ..
Conclusion
In my opinion, It is beneficial for the Sql database developers to be familiar with the database access patterns from the client end. Knowing how different components i.e ADO.NET, OLEDB, ODBC, TDS etc. fit together can help when making design decisions. The way Client constructs its data requests dictates the database engine behavior i.e Sql Server treats non-parametersied queries as ad-hoc queries whereas it caches the execution plans for the parameterised queries the same way as it does for stored procedures. Understanding how different events( captured by Profiler or Extended events) relate to the client requests can help us find and understand the root causes for the slow running queries, excessive locking etc. that may not be on the Sql Server side but on the data layer.
Thanks for reading
Dean Mincic