How to Purchase the Best High Availability Solution

Reading Time: 5 minutes

High availability is critical to protecting IT infrastructure from downtime, but how do you find the best HA solution to purchase?  We outline why it’s hard to figure out what to purchase and steps to take to get management on-board with investing in high availability.

Why is it so hard to purchase a great HA solution?

Barbara Joan (not her real name) leads a team that is responsible for a large part of the company’s IT infrastructure and continuously struggles to convince her management to make an investment in high availability. Whenever she recommends implementing high availability (HA) protection, different colleagues expressed their reservations, objections, suggestions for alternatives, and even downplayed the criticality of several past outages of their own enterprise applications.  

She is always left asking herself the same question: If the cost of downtime is estimated to be between $45,000 to $500,000 (USD), depending on the organization, industry, and impacted applications, then why is it so hard to align on purchasing a great (cost effective) HA solution?

Seven ways to convince your management that HA is a great investment

1. Consider the ROI of Cost Avoidance

The ROI of HA is more accurately calculated as cost avoidance. That is, a comparison of the cost of taking action to keep the current expected costs (adding high availability) to the cost of taking no action (downtime). 

Without HA protection, downtime is inevitable because IT systems are subject to any number of downtime factors from mechanical server failures to human errors, to software incompatibility and many more.. The cost also varies depending on the industry and the company’s size.  A large manufacturer, with stringent SLA’s and millions of transactions per day, will experience a larger penalty for unplanned downtime than a smaller company with a local or regional footprint. In addition, if your business is highly regulated or a critical services provider, additional penalty costs for downtime may be incurred beyond just products and goods sold. When IT evaluators place the wrong cost on downtime, the equation can make purchasing a robust, commercial solution more difficult.

2. Consider the cost of the total solution

Downtime itself takes a toll on the company that is almost incalculable – damage to reputation, customer dissatisfaction, and IT staff frustration to name a few. Barbara Joan is tired of interrupting her productive work to respond to stressful, sometimes chaotic downtime incidents. How do you calculate the contribution that downtime makes to very expensive staff turnover?

3. The right HA pays for Itself

Some companies believe that the cost of an HA solution is simply the cost of the software and servers required. They assume they can create their own using in-house resources or the cloud. However, these companies forget to consider many of the different aspects of the solution and their individual hidden costs. For example, homegrown solutions may be cheaper to implement in the short term, but they often include hidden costs such as maintenance, ongoing support, team training, documentation, technical debt, and break fix development.  In addition, many of the homegrown solutions do not necessarily estimate or account for what other work the team will not work on when attesting to the “we can do it cheaper in house” estimation. Like any DIY project, there are some things better left to the experts.

4. Clearly define what you mean by downtime

There are many versions of downtime; planned and unplanned.  Downtime includes issues caused by platform unavailability, application crashes, hardware failures, network outages, physical datacenter issues, breaches, and those caused by human error.  In some evaluations, customers and IT evaluators become focused on the platform’s availability and lose sight of other downtime causes.  For example, a large manufacturing company’s project manager discussed that while cloud platforms provide more resiliency, reliability and redundancy they do not cover all the issues impacting availability.  He went on to describe areas that many evaluators forget are included as root causes of downtime.

5. Clarify related terms

Recently I joined an industry panel discussing the typical customer’s needs with respect to application availability.  Within the first five minutes several panel participants had run through a dozen or more acronyms and abbreviations for different terms.  While some were easily decipherable, others were very niche or experiential based on the background of the IT professional. For example: HA+DR.  Is that High Availability + Disaster Recovery or High Availability with Data Replication?  The use of acronyms, combined with the varied use of terminology between persons with different levels of industry knowledge and experience can also create confusion and friction in the purchasing process.  As VP of Customer Experience, one customer evaluation encountered severe friction between the purchasing team because one approver believed the company only needed a solution for HA, while the other mentioned HA+DR.  In the end, the two realized that HA for one included two nodes, while HA for the other was two nodes plus DR. 

6. Clearly define the role of the HA solution

Expectations are another area that often hamper the purchase of a HA solution.  As VP of Customer Experience, we worked with a customer who was dealing with platform and application instability that led to repeated downtime.  During the evaluation process the customer lamented that the HA solution failed to address the platform instability.  Under load, the hardware CPU and memory pegged out and the network became unstable and nearly unusable.  Instead of addressing the underlying platform, through adequately sized systems or reliable infrastructure, the customer attributed the failures to the HA solution and went in a different direction.  IT admins sometimes struggle to set expectations with their management as to what HA can and cannot do. HA solutions are not a magic pill to solve all the IT infrastructure ills, rather it is an essential and critical component of a sound architecture. When misunderstandings occur with respect to expectations of the solution or requirements, the purchase is often hampered or prevented.

7. Explain why cloud SLAs do not provide application HA

Review your cloud platform SLAs and have a firm understanding of what they will and will not cover.  Many platforms provide much needed infrastructure stability, reliability and flexibility for previously maligned data centers. However, for most applications, the responsibility for availability and uptime still remains with the IT Admin and not the cloud vendor.  There is no “100% hands off” approach to HA, no matter where your system resides, on-premise or in the cloud.

Of course this is not a complete list of misunderstandings that make purchasing a great HA solution hard. Other notable misunderstandings often occur in the scheduling process, prioritization of the use cases, definition and clarification of requirements, success criteria, budget, budget authority, and the understanding (or lack thereof) of the risks of not going with an enterprise commercial HA solution. Contact SIOS to learn more about our HA solutions.

Bonus:

Eliminate misunderstandings between the layers of the organization
A major challenge with evaluating, purchasing and deploying a great HA solution occurs due to a misunderstanding between the different layers of the organization.  Think back to the first set of cost misunderstandings and consider what each person, responsible for a cost justification, might have to explain to their boss for approval.  Now consider the background of each person’s boss; are they technical or non-technical, in the same team or in a different part of the organization?  Now consider the relationship between the various IT layers in the company and how their needs and communication can factor into the discussion and decision.  Many companies that SIOS’ Customer Experience team works with have multiple technology teams for each part of the IT department; database, application, platform, network, security, etc.  Each of these technology teams must communicate well to define requirements, expectations, and success criteria.  This level of communication doesn’t happen easily, and can be even harder when all teams are remote and in different timezones. 

-By Cassius Rhue, VP Customer Experience


Recent Posts

Choosing Between GenApp and QSP: Tailoring High Availability for Your Critical Applications

GenApp or QSP? Both solutions are supported by LifeKeeper and help protect against downtime for critical applications, but understanding the nuances between these […]

Read More

What Causes Failovers to Happen?

Working in support, one of the most common questions we get from customers is “What prompted the failover from my primary node to […]

Read More

Step-by-Step – SQL Server 2019 Failover Cluster Instance (FCI) in OCI

Introduction If you are deploying business-critical applications in Oracle Cloud Infrastructure (OCI), it’s crucial to understand and leverage the availability SLA (Service Level […]

Read More