New Options for High Availability Clusters, SIOS Cements its Support for Microsoft Azure Shared Disk

SIOS Background
Reading Time: 3 minutes

Microsoft introduced Azure Shared Disk in Q1 of 2022. Shared Disk allows you to attach a managed disk to more than one host. Effectively this means that Azure now has the equivalent of SAN storage, enabling Highly Available clusters to use shared disk in the cloud!

A major advantage of using Azure Shared Disk with a SIOS Lifekeeper cluster hierarchy is that you will no longer be required to have either a storage quorum or witness node to avoid so called split-brain – which occurs when the communication between nodes is lost and several nodes are potentially changing data simultaneously. Fewer nodes means less cost and complexity.

SIOS has introduced an Application Recovery Kit (ARK) for our LifeKeeper for Linux product; called LifeKeeper SCSI-3 Persistent Reservations (SCSI3) Recovery Kit that allows for Azure Shared Disks to be used in conjunction with SCSI-3 reservations. This ARK guarantees that a shared disk is only writable from the node that currently holds the SCSI-3 reservations on that disk.

When installing SIOS Lifekeeper, the installer will detect that it’s running in Microsoft Azure EC2 and automatically install the LifeKeeper SCSI-3 Persistent Reservations (SCSI3) Recovery Kit to enable support for Azure Shared Disk.

Resource creation within Lifekeeper is straightforward and simple (Figure 1). Once locally mounted, the Azure Shared Disk is simply added into Lifekeeper as a file-system type resource. Lifekeeper will assign it an ID (Figure 2) and manage the SCSI-3 locking automatically.

Creating SAP Instance LifeKeeper
Figure 1. Creating an SAP Instance (sapinst) in LifeKeeper
Created Extended to both nodes
Figure 2: Created Extended to both nodes.

SCSI-3 reservations guarantee that Azure Shared Disk is only writable on the node that holds the reservations (Figure 3). In a scenario where cluster nodes lose communication with each other the standby server will come online, causing a potential split-brain situation. However, because of the SCSI-3 reservations only one node can access the disk at a time, which prevents an actual split-brain scenario. Only one system will hold the reservation and it will either become the new active node (in this case the other will reboot) or remain the active node. Nodes that do not hold the Azure Shared Disk reservation will simply end up with the resource in an “Standby State” state because they cannot acquire the reservation.

Output from LIfeKeeper Logs
Figure 3 – Output from Lifekeeper logs when trying to mount a disk that is already reserved.

Link to Microsoft’s definition of Azure Shared Disks https://docs.microsoft.com/en-us/azure/virtual-machines/disks-shared

At present SIOS supports Locally-redundant Storage (LRS) and we’re working with Microsoft to test and support Zone-Redundant Storage (ZRS). Ideally we’d like to know when there is a ZRS failure so that we can fail-over the resource hierarchy to the most local node to the active storage. At present SIOS is expecting the Azure Shared Disk support to arrive in its next release of Lifekeeper 9.6.2 for Linux.


Recent Posts

Step-by-Step – SQL Server 2019 Failover Cluster Instance (FCI) in OCI

Introduction If you are deploying business-critical applications in Oracle Cloud Infrastructure (OCI), it’s crucial to understand and leverage the availability SLA (Service Level […]

Read More

Four tips for choosing the right high availability solution

High Availability and Lebron is the Greatest Of All Time (G.O.A.T) Debate I was losing at Spades.  I was losing at Kahoot.  I […]

Read More

Disaster Recovery Solutions: How to Handle “Recommendations” Versus “Requirements”

Let’s say you experience an issue in your cloud cluster environment, and you have to contact one of your application vendors to get […]

Read More