High Availability Options for SQL Server on Azure VMs

SIOS DataKeeper
Reading Time: 5 minutes

Microsoft Azure infrastructure is designed to provide high availability for your applications and data. Azure offers a variety of infrastructure options for achieving high availability, including Availability Zones, Paired Regions, redundant storage, and high-speed, low-latency network connectivity. All of these services are backed by Service Level Agreements (SLAs) to ensure the availability of your business-critical applications. This blog post will focus on high availability options when running SQL Server in Azure Virtual Machines.

Azure Infrastructure

Before we jump into the high availability options for SQL Server, let’s discuss the vital infrastructure that must be in place. Availability Zones, Regions, and Paired Regions are key concepts in Azure infrastructure that are important to understand when planning for the high availability of your applications and data.

Availability Zones are physically separate locations within a region that provides redundant power, cooling, and networking. Each Availability Zone consists of one or more data centers. By placing your resources in different Availability Zones, you can protect your applications and data from outages caused by planned or unplanned maintenance, hardware failures, or natural disasters. When leveraging Availability Zones for your SQL Server deployment, you qualify for the 99.99% availability SLA for Virtual Machines.

Regions are geographic locations where Azure services are available. Azure currently has more than 60 regions worldwide, each with multiple Availability Zones. By placing your resources in different regions, you can provide even greater protection against outages caused by natural disasters or other significant events.

Paired Regions are pre-defined region pairs that have unique relationships. Most notably, paired Regions replicate data to each other when geo-redundant storage is in use. The other benefits of paired regions are region recovery sequence, sequential updating, physical isolation, and data residency. When designing your disaster recovery plan, it is advisable to use Paired Regions for your primary and disaster recovery locations.

Using Availability Zones and Paired Regions in conjunction with high availability options such as Availability Groups and Failover Cluster Instances, you can create highly available, resilient SQL Server deployments that can withstand a wide range of failures, minimizing downtime.

SQL Server Availability Groups and Failover Cluster Instances

SQL Server Availability Groups (AGs) and SQL Server Failover Cluster Instances (FCIs) are both high availability (HA) and disaster recovery (DR) solutions for SQL Server, but they work in different ways.

An AG is a feature of SQL Server Enterprise edition that provides an HA solution by replicating a database across multiple servers (called replicas) to ensure that the database is always available in case of failure. AGs can be used to provide HA for both a single database and multiple databases.

SQL Server Standard Edition supports something called a Basic AG. There are some limitations to Basic AGs in SQL Server. Firstly, a Basic AG only supports a single database. You need an AG for each database and the associated IP address and load balancer if you have more than one database. Additionally, Basic AGs do not support read-only replicas. While Basic AGs provide a simple way to implement HA for a single database, they may not be suitable for more complex scenarios.

On the other hand, a SQL Server FCI is a Windows Server Failover Cluster (WSFC) that provides an HA solution by creating a cluster of multiple servers (called nodes) that use shared storage. In the event of a failure, the SQL Server instance running on one node can fail over to another.

In SQL Server 2022 Enterprise Edition, the new Contained Availability Groups (CAG) address some of the AG limitations by allowing users to create system databases to CAG, which can then be replicated. CAG eliminates the need to synchronize things like SQL logins and SQL Agent jobs manually.

Availability Groups and Failover Cluster Instances have their own pros and cons. AGs have advanced features like readable secondaries and synchronous and asynchronous replication. However, AGs require the Enterprise Edition of SQL Server, which can be cost-prohibitive, particularly if you don’t need any other Enterprise Edition features. 

FCIs protect the entire SQL Server instance, including all user-defined databases and system databases. FCIs make management easier since all changes, including those made to SQL Server Agent jobs, user accounts and passwords, and database additions and deletions, are automatically reconciled on all versions of SQL Server, not just SQL 2022 with CAG. FCIs are available with SQL Server Standard Edition, which makes it more cost-effective. However, FCIs require shared storage, which presents challenges when deploying in environments that span Availability Zones, Regions, or hybrid cloud configurations. Read more about how SIOS software enables high availability for SQL servers

Storage Options for SQL Server Failover Cluster Instances

Regarding storage options for SQL Server Failover Cluster Instances that span Availability Zones, there are three options: Azure File Share, Azure Shared Disk with Zone Redundant Storage, and SIOS DataKeeper Cluster Edition. There is a fourth option, Storage Spaces Direct (S2D), but that is limited to single AZ deployments, so clusters based on S2D would not qualify for the 99.99% SLA and would be susceptible to failures that impact and entire AZ.

Azure File Share

Azure File Share with zonal redundancy (ZRS) is a feature that allows you to store multiple copies of your data across different availability zones in an Azure region, providing increased durability and availability. This data can then be shared as a CIFS file share, and the cluster connects to it using the SMB 3 protocol.

Azure Shared Disk

Azure Shared Disk with Zone Redundant Storage (ZRS) is a shared disk that can store SQL Server data for use in a cluster. SCSI persistent reservations ensure that only the active cluster node can access the data. If a primary Availability Zone fails, the data in the standby availability zone becomes active. Shared Disk with ZRS is only available in the West US 2, West Europe, North Europe, and France Central regions.

SIOS DataKeeper Cluster Edition

SIOS DataKeeper Cluster Edition is a storage HA solution that supports SQL Server Failover Clusters in Azure. It is available in all regions and is the only FCI storage option that supports cross Availability Zone failover and cross Region failover. It also enables hybrid cloud configurations that span on-prem to cloud configurations. DataKeeper is a software solution that keeps locally attached storage in sync across all the cluster nodes. It integrates with WSFC as a third-party storage class cluster resource called a DataKeeper volume. Failover Cluster controls all the management of the DataKeeper volume, making the experience seamless for the end user. Learn more about SIOS DataKeeper. 

Summary

In conclusion, Azure provides various infrastructure options for achieving high availability for SQL Server deployments, such as Availability Zones, Regions, and Paired Regions. By leveraging these options, in conjunction with high availability solutions like Availability Groups and Failover Cluster Instances, you can create a highly available, resilient SQL Server deployment that can withstand a wide range of failures and minimize downtime. Understanding the infrastructure required and the pros and cons of each option is essential before choosing the best solution for your specific needs. It’s advisable to consult with a SQL and Azure expert to guide you through the process and also review the Azure documentation and best practices. With the proper planning and implementation, you can ensure that your SQL Server deployments on Azure are always available to support your business-critical applications.

Contact us for more information about our high availability solutions.


Recent Posts

Strategies for Optimizing IT Systems for High Availability

Maintaining high availability (HA) of IT systems is essential for organizational success. From critical database management to ensuring seamless customer experiences, achieving uninterrupted […]

Read More

Choosing Between GenApp and QSP: Tailoring High Availability for Your Critical Applications

GenApp or QSP? Both solutions are supported by LifeKeeper and help protect against downtime for critical applications, but understanding the nuances between these […]

Read More

What Causes Failovers to Happen?

Working in support, one of the most common questions we get from customers is “What prompted the failover from my primary node to […]

Read More