Availability Groups
Multi-Site AG setup |
|
Creating Availbility Groups is pretty much just like setting up mirroring. I will create a page specifically for creating AGs and mirroring. This post is for creating multi-subnet AGs.
If the availability group spans multiple subnets, you should add a static IP address for each subnet to the listener.
Your server team, if you have one, will need to setup the requirements for a multi-subnet cluser. I am no server admin so I don't want to lead you astray. There are multiple powershell cmdlets to work with AGs. I don't set up AGs enough to justify using them. The GUI works just fine. My former senior, would kill me for saying that, but this is faster.
Check out this MSDN article for more info.
If the availability group spans multiple subnets, you should add a static IP address for each subnet to the listener.
Your server team, if you have one, will need to setup the requirements for a multi-subnet cluser. I am no server admin so I don't want to lead you astray. There are multiple powershell cmdlets to work with AGs. I don't set up AGs enough to justify using them. The GUI works just fine. My former senior, would kill me for saying that, but this is faster.
Check out this MSDN article for more info.
Availability Group stuck in Resolving State |
|
An AG stuck in a Resolving State happens for various issues involving AG health. It also happens when you do a manual failover of an Asynchronous node. After manual failover, you will need to run the following command on the other nodes of the AG to resume data synchronization.
ALTER DATABASE <DBNAME> SET HADR RESUME
In 2014 users can query nodes in a Resolving state, FYI.
ALTER DATABASE <DBNAME> SET HADR RESUME
In 2014 users can query nodes in a Resolving state, FYI.
Availability Group Do not Do list!!! |
|
- I was creating an AG on a DR node. So I was having some issues creating the AG. I had some errors. So I thought I would delete the AG from the DR node. So I deleted the AG. This also deleted the AG on the other two nodes, bringing down production. NOT GOOD!!! This put all the DBs in restoring state. Lesson learned. Fortunately downtime was only 55 minutes. I had to get 70 DBs recovered and then recreate the AG and join those DBs to the AG. Lets just say I was freaking out!!!!
- Turn off TLog backup jobs and then forget to turn them back on. That will bite you in the ass in a few hours.
AG Unknown!!! |
|
SAN and Server team were doing Storage Virtualization , EMC vPlex. We had to stop SQL on a node while they did their thing. Brought SQL backup and AG indicated UNKNOWN!! Kind freaked out until the server engineer said he had paused the node. Just FYI.
You setup a Multi-Subnet AG but you are getting connection timeouts!!! |
|
Was setting up a Multi-SubNet Availability Group. Were adding the last couple of steps. We thought everything was cool. Then an hour or two later started getting calls that customer connections were timing out. What we found that when adding both IPs to the cluster, even with the "OR" dependency that the cluster will submit all IPs to be registered in DNS. What can happen is that SQL could try and connect to either of the IPs, including the IP that is offline in the cluster. So how do we fix that? We can do that two ways.
- Add this parameter to your connection string. MultiSubnetFailover = True. It might be impossible to do this. Check out if your app can support this parameter
- On the cluster set RegisterAllProvidersIP = 0. This is a server configuration setting, you can not do this via SQL. This will only register the online IP. There is an additional pararmeter to change. HostRecordTTL needs to be changed. This parameter determines how long before cached DNS entries will expire.
- This link will explain everything in much more detail http://blogs.msdn.com/b/alwaysonpro/archive/2014/06/03/connection-timeouts-in-multi-subnet-availability-group.aspx
ReadOnly Routing Lists |
|
If you are using Availability Groups a cool part of this is ReadOnly Routing. You can send all Read traffic to a specific server. You can also then define a routing list that will manage moving the ReadOnly Routing if a server goes down. Here is the syntax on how to do that. Below the code sample is great video fully explaining it. Skip to 35:14 to hear about the ReadOnly Routing List.
ALTER AVAILABILITY GROUP [AG-Name]
MODIFY REPLICA ON
N'<servername > WITH
(PRIMARY_ROLE (READ_ONLY_ROUTING_LIST=('<Server1>','<Server2>','DR_Server')));
ALTER AVAILABILITY GROUP [AG-Name]
MODIFY REPLICA ON
N'<servername > WITH
(PRIMARY_ROLE (READ_ONLY_ROUTING_LIST=('<Server1>','<Server2>','DR_Server')));
Availability Group not synchronizing |
|
If your AG databases are all online and for some reason your replica DBs are in a NOT SYCHRONIZING STATE. It could be as simple as Resuming the synchronization process. Here is the syntax
ALTER DATABASE database_name SET HADR RESUME
ALTER DATABASE database_name SET HADR RESUME
Failover AG with Powershell |
|
Pretty straight-ahead process.
cd c:\ps
$server = "name of secondary server"
cd sqlserver:\sql\$server\default\AvailabilityGroups
dir | ?{$_.name -eq "name of your AG"}| %{$_.failover();$_.alter()}
cd c:\ps
$server = "name of secondary server"
cd sqlserver:\sql\$server\default\AvailabilityGroups
dir | ?{$_.name -eq "name of your AG"}| %{$_.failover();$_.alter()}
Why Can't I add a Replica to the AG?? |
|
So I was trying to add a Replica to an AG via the wizard and it wouldn't connect. I had enabled AlwaysOn for the SQL Service prior to the node being added to the cluster.
So the fix, was uncheck AlwaysOn for the SQL Server in Configuration Manager and then bounce SQL. Once SQL is back up check AlwaysOn for the SQL Service and you should be good to go.
So the fix, was uncheck AlwaysOn for the SQL Server in Configuration Manager and then bounce SQL. Once SQL is back up check AlwaysOn for the SQL Service and you should be good to go.
Replica couldn't connect. SSPI Handshake
I was getting this error in the log. In order to fix we had to modify the endpoint from Windows Kerberos to Windows Negotiate. This connections using either NTLM or Kerberos.
Database Mirroring login attempt failed with error: 'Connection handshake failed. An OS call failed: (80090303) 0x80090303(The specified target is unknown or unreachable)
Once again Aaron Bertrand and the folks at Sentry One to the rescue.
https://sqlperformance.com/2013/04/system-configuration/troubleshooting-alwayson
CREATE ENDPOINT [HADR_Endpoint]
STATE=STARTED
AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL)
FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE
, ENCRYPTION = REQUIRED ALGORITHM AES)
Database Mirroring login attempt failed with error: 'Connection handshake failed. An OS call failed: (80090303) 0x80090303(The specified target is unknown or unreachable)
Once again Aaron Bertrand and the folks at Sentry One to the rescue.
https://sqlperformance.com/2013/04/system-configuration/troubleshooting-alwayson
CREATE ENDPOINT [HADR_Endpoint]
STATE=STARTED
AS TCP (LISTENER_PORT = 5022, LISTENER_IP = ALL)
FOR DATA_MIRRORING (ROLE = ALL, AUTHENTICATION = WINDOWS NEGOTIATE
, ENCRYPTION = REQUIRED ALGORITHM AES)