High Availability and Disaster Recovery | SQL CHIT CHAT … Blog about Sql Server & Snowflake

In this blog, I wanted to share a recent experience of implementing Read Only Routing functionality on the existing HA/AG Sql Server Cluster. The technology significantly offloaded the primary production server and enhanced the read queries performances.

The Sql Server HA/AG environment

Windows Failover Cluster

The main business application uses Sql Server set of databases hosted by a WSFC(Windows Failover Cluster). The cluster has two nodes, NODE1 and NODE2. Both nodes have Windows 2012 R2 Server OS and Sql Server 2014 SP1 database platform. The failover cluster has VNN(Virtual Network Name) CLUST1.

Availability group

The system has an Availability Group AG1 (and consequently a WSFC resource group). The AG1 group members are the main business application’s backend databases.
The two availability replicas host the AG1 availability group databases. Each replica exists on a different node, primary on the NODE1, and secondary on the NODE2.
The primary replica is read/write. The secondary replica, in addition to providing the HA/DR for AG1, is set to allow Read Only access for the clients. The readable secondary replica is set to “YES” which allows read-only workloads even without ApplicationIntent parameter in the connection string definition.

The AG1 availability mode is set to : Synchronous-commit mode
The AG1 failover type is : Automatic failover

The business databases usage pattern

Because the way the cluster was set up the clients were connecting to the production databases through WSFC – CLUST1. All the requests were directed to the current primary replica on the NODE1. The secondary replica on the NODE2, even if set up to allow read-only connections, was never used for that purpose. It was possible to connect to the current secondary replica directly through NODE2 though.
The clients who use the databases are:

~200 concurrent business application users. They perform read/write queries through the app’s UI
~10 scheduled jobs that runs regularly during the business hours. The tasks mostly extract data from the business databases (read only operations) in order to provide the required business information for a number of different applications on a number of different platforms i.e Windows Active Directory, AWS, AZURE, Linux etc.
An “unknown”, or should I say, NULL 🙂 number of users who regularly run the ad-hoc, read-only queries through a number of different applications e.g Excel Power Queries, Tableau, Crystal Reports and who knows what..
A number of custom, in-house build, LAMP model applications (only in this case M stands for Ms Sql Server..) that also perform read only operations.

Even if the high-end hardware keeps the cluster’s performance on the acceptable level, there is a lot of room for a number of improvements. One of these improvements, the Read-Only Routing is described in this blog post.

Read Only Routing

Simply speaking, with Read Only Routing in place, within the connection string, we can specify the client app. connection intention. The intention will later decide on which server the request will be routed. The intention can be:

Read only – Client connection needs only to read data
Read and write (Default value) – Client connection needs to read and write data.

This means that if we know that an application needs only to read data, we can use a connection string property ApplicationIntent with the value of READONLY to “tell the cluster” to redirect the requests to the secondary, read-only replica. This will offload the NODE1 resources(assuming that the current server role is the primary replica) and will “employ” the NODE2 resources in order to execute the queries.
The connection string property ApplicatoinIntent is supported by all SQL Server 2014 Clients such as ADO.NET(SqlClient), Sql Server Native Client 11.0 ODBC driver, SQLNCLI11 OLEDB Driver.
To make the cluster be able to recognise and redirect such a connections, we need to set up:

Availability group Listener for the same subnet ( in this case both replicas are on the same subnet)
Read-Only routing list

Availability group listener

The availability group listener is a WSFC resource that is logically bound to the AG through the corresponding WSFC resource group . It’s a VNN(virtual network name) with one or many IP addresses attached to it. The Listener runs on a single node(always on the current primary replica) and just handles the connection forwarding. The listener endpoint is always the primary replica.

It is possible to create multiple listeners for an availability group. Only the first listener can be created through tsql. The others can be created using WSFC Failover Cluster Manager or PowerShell script.
Since I was setting up the first listener for AG1, I was able to use tsql.

To set up an AG Listener we need to:

Set up a DNS entry, preferably using a static IP address. In this case the DNS name is AG1-Listener. *
Add the listener to the existing availability group, AG1.

*NOTE: If you decide to use SSMS to create a new listener, SSMS (through WSFC) will create a DNS entry for you. In this case you’ll need to provide the required AD credentials to the WSFC to be able to create the entry – Create Computer Objects permission.

Add the listener to the existing availability group

USE [master]
GO
ALTER AVAILABILITY GROUP AG1
    ADD LISTENER 'AG1-Listener' (
        WITH IP(('10.XX.XX.XX','255.XX.XX.XX'))
       ,PORT = 1433)

Check the listener’s metadata:

SELECT gl.dns_name
      ,gl.port
      --,gl.is_conformant
      ,gl.ip_configuration_string_from_cluster
      ,lip.ip_address
      ,lip.ip_subnet_mask
      ,lip.is_dhcp
      ,lip.network_subnet_ip
      ,lip.network_subnet_prefix_length
      ,lip.network_subnet_ipv4_mask
      ,lip.state_desc
FROM sys.availability_group_listeners gl
INNER JOIN sys.availability_group_listener_ip_addresses lip
    ON gl.listener_id = lip.listener_id

Set up Read-only routing lists

Read-Only routing URL is a pointer that determines which replica will response to the client driver(provider) Read-Only request.

Read-Only routing mechanism performs the following sequence of events when allocating a secondary replica to route the clients requests to.

Client requests a connection to AG1-listener specifying ApplicationIntent=ReadOnly, Server=’AG1-Listener’, Database=’prod_db’. The request is routed to the NODE1(primary replica).
NODE1 checks the incoming connection, ApplicationIntent parameter value and checks if the specified database is in the availability group.
The server enumerates and checks the replicas defined in the Read-Only routing list.
The routing target is the first readable secondary replica that accepts ALL or Read-Only connections from the list.
The server sends the secondary replica’s Read-only routing URL to the client.
The client then redirects the connection request to the secondary replica following the provided URL.

To set up Read-only routing list we need to assign the Read-only routing URLs for both of the replicas since the replicas may switch roles during the failover. The URLs is used to tell the client where to redirect the Read-only requests.
The URL contains network protocol, system address and the port number.

/*
Define Read-Only URL on the current primary replica.
The URL will be used when the replica switch role and become secondary rep.
*/
ALTER AVAILABILITY GROUP AG1
	MODIFY REPLICA ON N'NODE1'
	WITH (SECONDARY_ROLE(READ_ONLY_ROUTING_URL=N'tcp://NODE1.myDomain.au:1433'))
GO

/*
Define Read-Only URL on the current secondary readable replica.
*/
ALTER AVAILABILITY GROUP AG1
	MODIFY REPLICA ON N'NODE2'
	WITH (SECONDARY_ROLE(READ_ONLY_ROUTING_URL=N'tcp://NODE2.myDomain.au:1433'))
GO

Check the Read-Only URLs

SELECT replica_server_name 
      ,[endpoint_url]           AS WSFC_NodeEndpointURL
      ,availability_mode_desc   AS AvailabilityMode
      ,failover_mode_desc       AS FailoverType
      ,primary_role_allow_connections_desc AS AllowedConnectionsWhenPrimary
      ,secondary_role_allow_connections_desc AS AllowedConnectonsWhenSecondary
      ,read_only_routing_url 
FROM sys.availability_replicas
WHERE read_only_routing_url IS NOT NULL

..and finally create the Read-Only routing list

--when a client read-only request hits the primary replica, the server will response with 
--the secondary replica (NODE2) URL since the replica is the first on the list
ALTER AVAILABILITY GROUP AG1
    MODIFY REPLICA ON N'NODE1'
    WITH (PRIMARY_ROLE(READ_ONLY_ROUTING_LIST=(N'NODE2',N'NODE1')))
GO

-- and if the replicas swith the places, the system will response with the 
-- current secondary replica( now NODE1) URL
ALTER AVAILABILITY GROUP AG1
    MODIFY REPLICA ON N'NODE2'
    WITH (PRIMARY_ROLE(READ_ONLY_ROUTING_LIST=(N'NODE1',N'NODE2')))
GO

Check the read-only routing lists

-- https://blogs.msdn.microsoft.com/alwaysonpro/2014/01/22/modifying-alwayson-read-only-routing-lists/
SELECT  ag.name AS [Availability Group]
       ,ar.replica_server_name AS [When Primary Replica Is]
       ,rl.routing_priority AS [Routing Priority]
       ,CASE 
            WHEN ar.replica_server_name = ar2.replica_server_name 
                THEN 'Read-Only on primary' + '('+ ar.replica_server_name + ')'
            ELSE ar2.replica_server_name
        END AS [Read-Only Routed To]
       ,ar.secondary_role_allow_connections_desc
       ,ar2.read_only_routing_url
FROM    sys.availability_read_only_routing_lists rl
        INNER JOIN sys.availability_replicas ar 
            ON rl.replica_id = ar.replica_id
        INNER JOIN sys.availability_replicas ar2 
            ON rl.read_only_replica_id = ar2.replica_id
        INNER JOIN sys.availability_groups ag 
            ON ar.group_id = ag.group_id
ORDER BY ag.name
       ,ar.replica_server_name
       ,rl.routing_priority

A few more things about the readable, secondary replicas

There is a certain data latency between primary and secondary replica. The data movement process usually takes a few seconds and follows the pseudo-sequence below.

User make a change on AG databases (Begin transaction)
Primary replica sends a log information which describes the actions to be performed on the secondary replica(s).
The secondary replica(s) runs a dedicated “Redo” process that commits the changes on the secondary replica. At this point in time, the changes are not visible by the read-only queries that runs on the secondary. The readers are not blocked by the redo actions* as they read the previous version of the data that’s being changed.
The change is committed on the primary replica (Commit transaction ). The changes are visible on both replicas.

*All read-only queries that runs on the secondary databases are automatically executed within the SNAPSHOT TRANSACTION ISOLATION LEVEL. All locking hints e.g (tablock,xlock,..etc) are ignored.

The DDL actions on the primary replica(applied through the log records redo actions) and conversely, the READ actions on the secondary replica can block each other. Secondary replica’s “SELECT” queries require Sch-S locks that guarantee the schema stability of the objects in use e.g During the execution of.. Select * from tab1, the lock manager has to prevent other sessions from changing the tab1 table structure during the select operation. On the other hand, the redo DDL actions require Sch-M (Schema modification locks) that prevent concurrent access to the locked resource e.g (TRUNCATE TABLE tab1 ). Schema locks are acquired regardless of transaction isolation level. Sch-S is compatible with all other locks except Sch-M (and Sch-M is incompatible with all lock modes). It’s probably not a good idea to perform read operations on the secondary replica if the main system performs frequent DDLs.

Conclusion

Always On Availability group active, secondary replicas support the Read-Only access. The client requests for read-only access can be automatically redirected to the secondary replica. This will help to conserve primary replica’s resources for the mission critical workloads. The read-only queries will run faster since the workloads ,by default, use row versioning (snapshot isolation level) to remove read/write blocking. The read-only access will have some data latency and that needs to be considered when deciding to go with this technology.

Thanks for reading.

Dean Mincic

SQL CHIT CHAT … Blog about Sql Server & Snowflake

Category Archives: High Availability and Disaster Recovery

Read Only Routing in Sql Server