Planning HADR in SQL Server

The estimated reading time for this post is 8 minutes

To successfully implement high availability for on-premises implementations of SQL Server, you need to understand the difference between high availability and disaster recovery, and how to plan solutions that can successfully protect data and services.

High availability and disaster recovery are related, though distinct, concepts. Both involve protecting and maintaining access to data and services. However, the purpose of a high availability solution is to prevent service outages from occurring in the first place. The purpose of a disaster recovery solution is to enable you to make services available again after an outage has occurred.

One type of disaster recovery solution that most organizations use in some form is data backups. The problems with relying solely on database backups for disaster recovery include:

  • Backups only preserve data
  • Recovery by using backups can be relatively slow

By using SQL Server AlwaysOn FCIs and AlwaysOn Availability Groups, you can create disaster recovery solutions that can rapidly bring services back online after an outage. However, these technologies do not eliminate the need for backups entirely. Even if you implement an AlwaysOn disaster recovery solution, you should still carry out a backup schedule to enable recovery from scenarios such as:

  • Recovery from logical data corruption
  • Widespread disaster

The process of planning high availability and disaster recovery typically includes calculating RTO and RPO. The RTO is the period of time within which the service needs to be restored, and the RPO is the period during which data loss is deemed acceptable in the event of a failure. RPO and RTO are targets dictated by the needs of the business, and planners of high availability and disaster recovery planning solutions can use these targets to guide them in their decision-making.

The following table provides general guidelines that you can use to incorporate RTO and RPO into high availability and disaster recovery plans:

SolutionPotential data loss period (RPO)Potential recovery time (RTO) unit of measurementAtomic failover
AlwaysOn Failover Cluster InstanceDepends on storage solutionSeconds or minutes Yes
AlwaysOn Availability Group with Synchronous-Commit None SecondsYes
AlwaysOn Availability Group with Synchronous-CommitSecondsMinutesNo
Database Mirroring in High-Safety modeNoneMinutesYes
Database Mirroring in High-Performance modeSecondsMinutesNo
Log ShippingMinutesMinutes or hoursNo
Backup and RestoreHours or daysHours or daysNo

Considerations for Implementing High Availability and Disaster Recovery

In addition to RPO and RTO, there are various additional factors to consider when planning a high availability and disaster recovery solution.

  • The number of sites
  • The cost of maintaining idle servers for failover
  • Quorum considerations
  • Store tempdb locally to reduce I/O on shared storage
  • Use AlwaysOn technologies to implement high availability and disaster recovery solutions:
    • Multi-site failover cluster instance
    • AlwaysOn availability group
    • Failover cluster instance and availability group in combination

You can use the following combinations of SQL Server technologies to create high availability and disaster recovery solutions:

HA and DR ConfigurationBenefitsEquivalent legacy solution
Multi-site AlwaysOn Failover Cluster Instance provides high availability and disaster recovery• Provides high availability for the SQL Server instance
• Enables disaster recovery for the SQL Server instance
SQL Server multi-subnet failover cluster
AlwaysOn Availability Group provides high availability and disaster recovery• Provides high availability for a database or group of databases
• Enables disaster recovery for a database or group of databases
Database Mirroring for high availability, log shipping for disaster recovery
AlwaysOn Failover Cluster Instance provides high availability, AlwaysOn Availability Group provides disaster recovery• Provides high availability for the SQL Server instance
• Enables disaster recovery for a database or group of databases
SQL Server failover cluster for high availability, database mirroring for disaster recovery
  • Degraded availability
  • Test high availability and disaster recovery plans
  • Redundancy solutions for network components such as network cards, switches, and routers.
  • Standardization of permissions and file paths across sites where appropriate.
  • The availability of security certificates in all sites.
  • Implement identical SQL Server logins on each SQL Server instance. Using Active Directory domain logins reduces complexity in this area because you do not need to ensure that passwords are the same on all instances.
  • A mechanism for ensuring that SQL Agent jobs can run on all instances.

There are several common architectures that database architects working with SQL Server use to implement high availability and disaster recovery.

 

Multi-Site Failover Cluster Instance Solution

You can use a multi-site AlwaysOn FCI to provide local high availability and a remote disaster recovery site. A multi-site FCI protects the entire SQL Server instance, both in terms of high availability and disaster recovery.

Consider the following points for high availability and disaster recovery solutions that use a multi-site FCI:

  • To ensure that data remains available after a disaster, a multi-site FCI requires a storage solution in each site, and relies on hardware-based replication to copy data from the shared storage in the primary site to the shared storage in the secondary site. For this reason, the recommended shared storage solution for a multi-site FCI is two storage area networks (SANs). When you create a multi-site FCI, you should skip the storage validation step during configuration, and when prompted confirm that you do not require support from Microsoft for the cluster. In fact, selecting this option does not disqualify you from obtaining support from Microsoft because a multi-site FCI is a supported configuration.
  • You can use a multi-site FCI regardless of the recovery model that your databases use, unlike AlwaysOn Availability Groups, which require databases to use the FULL recovery model.
  • When you configure a multi-site FCI, the IP address resource dependency is automatically set to use an OR dependency.

AlwaysOn Availability Group Solution

You can use an AlwaysOn Availability Group to create a high availability and disaster recovery solution.

You should consider the following points when planning to use an AlwaysOn Availability Group for high availability and disaster recovery:

  • All servers in the Availability Group need to be in the same Active Directory domain.
  • Like the AlwaysOn FCI solution, using an AlwaysOn Availability Group for high availability and disaster recovery requires a WSFC to support it. However, a key difference is that an AlwaysOn Availability Group does not require shared storage, which means that the hardware and implementation costs are usually lower.
  • An AlwaysOn Availability Group delivers high availability and disaster recovery for a single database or group of databases, not for the entire SQL Server instance.
  • You can use the Availability Group secondary replicas, including the secondary replica at the disaster recovery site, as active secondaries. This enables organizations to make more efficient use of server resources. 
  • All databases that participate in an AlwaysOn Availability Group must use the FULL recovery model.
  • You can use up to eight secondaries if required, enabling you to scale out read-only workloads, improve response times by adding active secondaries to local sites such as branch offices, and add resiliency by placing replicas in more sites.
  • Removing the quorum vote from the secondary replica in the disaster recovery site is recommended practice. If the primary site does not contain an odd number of WSFC nodes, you can use the Node and Fileshare Majority quorum model and add a file share witness to ensure that the cluster has an odd number of quorum votes.
  • Failover in the primary site is automatic, but if a disaster occurs and you need to fail over to the disaster recovery site, you must perform a manual failover operation. To do this, you need to carry out the following actions in the disaster recovery site:
    • Force quorum on the cluster node that hosts the secondary replica in the disaster recovery site and start the Cluster Service. 
    • Force failover of the availability group. You can use the Transact-SQL ALTER AVAILABILIITY GROUP statement with the FORCE_FAILOVER_ALLOW_DATA_LOSS option to do this. You can also use SQL Server Management Studio (SSMS) or PowerShell.
    • Change the quorum voting configuration by removing votes from the nodes in the primary site and giving a vote to the node in the disaster recovery site.
    • Because an Availability Group only protects databases, you will need to transfer logins and SQL Server Agent jobs separately. 
    • After failover to the disaster recovery site, the service is not highly available. To make it highly available, you could add a secondary replica in the disaster recovery site.

Combined AlwaysOn Failover Cluster Instance and AlwaysOn Availability Group Solution

You can use a combined AlwaysOn FCI and AlwaysOn Availability Group high availability and disaster recovery solution. The FCI provides local high availability for the SQL Server instances, and the AlwaysOn Availability Group enables disaster recovery for the database availability set.

When planning a combined AlwaysOn FCI and AlwaysOn Availability Group solution, you should consider the following points:

  • The solution requires a single WSFC and a SQL Server AlwaysOn FCI at the primary site. You can use either a second SQL Server AlwaysOn FCI or a stand-alone SQL Server instance installed on a WSFC node in the disaster recovery site. The advantage to using a SQL Server AlwaysOn FCI at the disaster recovery site is that you can ensure continued high availability at the instance level after a disaster. 
  • Each site has its own storage, which is not visible to the nodes in the other site. When shared cluster storage is only shared between some of the nodes in a cluster, this is referred to as asymmetric storage. Windows Server 2012 supports asymmetric storage, as does Windows Server 2008 R2 with Service Pack one. 
  • You can use the secondary replica in the disaster recovery site as an active secondary if required. 
  • All databases that participate in the Availability Group availability set must use the FULL recovery model.
  • As with the AlwaysOn Availability Group solution described in the previous topic, you should remove quorum votes from the cluster nodes in the disaster recovery site. If there is an even number of nodes in the primary site, you can use one of these two configurations:
    • Node and fileshare majority quorum model with a fileshare witness.
    • Node and disk majority, using the asymmetric storage as a disk witness. 
  • Failover between FCI nodes is automatic, but when you install an AlwaysOn Availability Group on an FCI, failover between the primary replica and the secondary replica in the Availability Group is a manual operation. To fail over to the disaster recovery site, perform the following steps:
    • Force quorum on the cluster node that hosts the secondary replica in the disaster recovery site and start the Cluster Service. 
    • Force failover of the availability group. You can use the Transact-SQL ALTER AVAILABILIITY GROUP statement with the FORCE_FAILOVER_ALLOW_DATA_LOSS option to do this. You can also use SSMS or PowerShell.
    • Change the quorum voting configuration by removing votes from the nodes in the primary site and giving a vote to the node in the disaster recovery site.
    • Because an Availability Group only protects databases, you will need to transfer logins and SQL Server Agent jobs separately.

 

Hamid J. Fard

I am SQL Server Data Platform Expert with more than 9 years’ of professional experience, I am currently Microsoft Certified Master: SQL Server 2008, Microsoft Certified Solutions Master: Charter-Data Platform, Microsoft Data Platform MVP and CIW Database Design Specialist. I also do Pig, Hive and Sqoop development regarding Hadoop Big Data platform. After a few years of being a production database administrator I jumped into the role of Data Platform Expert. Being a consultant allows me to work directly with customers to help solve questions regarding database issues for SQL Server.

More Posts

Follow Me:
FacebookLinkedIn


Leave a Comment

avatar
1000
  Subscribe  
Notify of