Generally, a quorum is defined as a body or group of people who are present to make decisions.
In LifeKeeper, Quorum enforces a consensus that uses the status of nodes in a cluster to carry out the next step in handling a node failure within a cluster. LifeKeeper quorum can be operated under three modes; Storage, Majority, and TCP Remote (TCP Remote is only available with LifeKeeper for Linux).
- Storage Quorum uses a shared storage device to keep track of updates provided by other systems in the cluster, If a system provides no update, Quorum will mark that cluster as failed.
- Majority Quorum relies on the structure of an odd number of clusters, where one node serves as a witness to determine if one or all nodes in the cluster can’t communicate
- TCP remote connections through a TCP/IP service on a specified port to verify if nodes in the cluster can communicate with each other.
Understanding the Importance of Quorum in Clusters
Quorum’s purpose is to maintain the availability of applications by taking remedial actions to navigate unplanned situations. It accomplishes this by lessening the risk of split-brain situations and reducing downtime by maintaining communication between all the nodes in the cluster.
The Risks of Operating Without Quorum in Your Cluster
There is a risk involved when using a cluster configured without Quorum. The following scenarios will address the effect of not having a quorum and the importance of implementing it.
Scenario 1: Reducing downtime
Unintentional downtime can happen when one or more systems are not available for use as a result of an unavoidable action, for example, a crash or a temporary failure in network communication.
With quorums like storage or TCP remote configured, access to storage devices and or ports can be used to keep track of the status of the communication in the cluster. This additional measure can prevent an unnecessary failover that could cause significant downtime. In other cases, Quorum will take measures to either shut down or reboot the server to restore it to a healthy state and avoid longer downtime.
Scenario 2: Split Brain
A split-brain is when multiple systems in the cluster believe they are the primary server. This can happen when a primary server loses communication to its secondary server, and the secondary server believes the primary system went down. This leads to two active primary systems in the cluster.
If Majority quorum was configured, another system would be provisioned as the witness to serve as a vote for which system should serve as the primary system, preventing the split-brain from happening.
Why Proper Quorum Configuration Matters
Operating a cluster without storage or majority quorum is dangerous because it increases the risk of experiencing data loss or prolonged downtime as a result of a split-brain and/or a network outage. Using Quroum can provide counteractive measures by making sure the cluster is always healthy and that any unhealthy system is handled appropriately.
Contact SIOS today to learn how our high availability solutions can help you configure quorum the right way and keep your clusters protected.
Author: Alexus Gore, Customer Experience Software Engineer at SIOS Technology Corp.