High Availability and Disaster Recovery Everywhere: From General Concepts to Generating Solutions

Reading Time: 5 minutes

Related Blogs / Background Reading Recommendations

Within this blog, there is also the assumed familiarity with the LifeKeeper Resource Hierarchy framework and LifeKeeper Clustering in general. For background on these topics, the blogs listed below provide fantastic context. Additionally, this blog builds upon a previous blog regarding one means to close the gap between possible use cases and supported protection mechanisms via the use of the “Quick Service Protection Application Recovery Kit” (QSP ARK) within LifeKeeper, linked below.

This blog, however, will explore the options available when the QSP ARK cannot meet the demands for High Availability and Disaster Recovery for a particular application or use case. 

A Quick Refresher 

The previous part of this blog addressed the means to think about an application for the purpose of creating a Generic Application Recovery Kit to protect that application with LifeKeeper. In this section, the foundations for understanding an application presented in part one will be contextualized for the application to LifeKeeper’s Generic Application Recovery Kit framework. 

As this installment into the blog series is best considered in tandem with the previous installment, here is a short refresher of the concepts presented in part 1. 

Ask the smallest question that still provides useful/actionable information 

In determining how broad questions will be answered, break them down into smaller questions with simpler answers. Use those smaller questions to iteratively rebuild the answer to the original “broad” question. 

Use the information given

Understand the utilities available and how these utilities convey information. Knowing the information needed to answer the “smallest questions”, determine how the information presented via an Application’s API can be used to determine the answers to the “smallest questions”.

With the refresher out of the way, LifeKeeper can finally be brought into the picture. Though High Availability and Disaster recovery are often associated with complexity, significant effort has been made to ensure that understanding of a particular application is easily ported into the Generic Application Recovery Kit framework. With the previous foundation in mind, it is time to use that as the base for thinking like LifeKeeper. 

Thinking Like LifeKeeper

Imagine you are in a “Freaky Friday” scenario, such as the movie where a girl and her mother switch bodies and fumble with the lack of familiarity with each other’s daily responsibilities. The person with whom you have switched is attending a go-live activity in your stead and needs to know how to start key applications, make sure they are running correctly, and then stop the application. How would you explain these things to them if you only had a 15-minute phone call to prepare them? What are the details you would have to specify so they can complete the job?  

This scenario, while contrived, is a great way to break down the key resource protection actions. LifeKeeper manages applications to ensure they are only running on a single system, and the LifeKeeper Resource Hierarchy ensures that pre-requisite applications and system resources are processed in the correct order whenever resources are restored or removed. In turn, LifeKeeper enables a developer to think about the actions on an application protected with a Generic Application resource in the context of a single system. Distilling the process to perform a start, stop, or query upon an application to the bare essentials is the first step to defining what the Generic Application action scripts need to accomplish. When start, stop, and query are defined according to the above strategy, the resource actions relate one-to-one, like so:

  • Restore Action: Application Start  
  • Remove Action: Application Stop
  • quickCheck Action: Query Application
    • Note: QuickCheck actions are optional for Generic Application Resources, and monitoring will not be performed if the QuickCheck action is not defined. However, regular application monitoring is highly recommended to ensure the best outcomes for implementing High Availability and Disaster Recovery! 
  • Local Recovery Action: Application Stop and Application Start (in sequence)
    • Note: Local Recovery Actions are optional for Generic Application Resources. When Local Recovery is not defined, a Generic Application Resource will not attempt a restart to repair itself on the system in which the failure was detected, but instead will stop on the failing system, and the entire hierarchy for the application will migrate to a standby system.

Sometimes, LifeKeeper will need to know certain details about a running application in order to perform the actions listed above. All LifeKeeper Resources, including the Generic Application Resource, have what is called a “Resource Information Field”, its purpose being to provide this information to the action scripts for use during resource actions. The resource information contained within the information field can be configured independently for each system at the time of resource extension, allowing for resource actions to make use of information specific to the system on which the action is performed. LifeKeeper also provides command-line utilities to easily get or set the resource information. 

Calling back to the “Freaky Friday” example, what are the details that you would have to tell the person with whom you switched? Think of things such as key paths for application files, specific settings/values for command arguments, and similar details. The information field is a great place to put information that must be known to determine other details about the application. The information field is also a great place to insert values for settings, command arguments, or values that otherwise could not be derived. It is worth considering, by LifeKeeper convention, that the information field is not frequently (if ever) changed over the lifetime of a resource. Information that varies over the lifetime of a resource is best kept out of the resource information to avoid corruption of this field, and instead obtained programmatically in a LifeKeeper Resource’s action script(s) or via a “helper” script that the action scripts can then invoke. 

As an extra assistance, LifeKeeper is delivered with template scripts for developing a generic application. These are a fantastic starting point for a Generic Application’s action scripts, as they come pre-prepared to receive the input arguments LifeKeeper will use when invoking actions for a particular resource. In turn, this also makes that information available for use within resource action scripts. 

Conclusion

LifeKeeper provides a myriad of ways to protect applications. Still, some applications have requirements beyond what is offered in the LifeKeeper Application Recovery Kits. In such cases, High Availability and Disaster Recovery protection is still possible, and may be simpler to achieve than previously thought. Generic applications are not something from which an organization should shy away; instead, they are one of the many powerful tools offered by LifeKeeper to uplift an environment’s High Availability and Disaster Recovery capabilities. The Generic Application framework was created to be accessible and versatile. Still, if your organization does not have the resources to spare for writing a Generic Application Recovery kit in-house, SIOS offers Professional Services offerings wherein SIOS Engineers will coordinate requirements and develop a Generic Application Recovery Kit on behalf of your organization. If ongoing support is a requirement, SIOS Professional Services also provides offerings that expand normal product support to include Generic Applications developed by SIOS Professional Services. The barriers to entry for protecting your organization’s business-critical applications are ever-shrinking, and SIOS Protection Suite for Linux or Windows aims to be at the forefront of the charge to render unprotected applications a thing of the past. 

Not every application fits a standard high availability model. SIOS can help you design and implement the right LifeKeeper solution for your business critical workloads. Request a demo today.

Author: Philip Merry Support Engineer at SIOS Technology Corp.


Recent Posts

Solution for Linux

Taking Over a SIOS LifeKeeper for Linux Cluster

Imagine you’re standing outside of your minivan, baby in hand, reaching for the baby’s diaper bag when a large black van with a […]

Read More

Eliminating Single Points of Failure

In the world of enterprise IT, the phrase “Single Point of Failure” (SPOF) is enough to keep any system administrator awake at night. […]

Read More

3 Challenges of Maintaining High Availability with a Legacy Infrastructure

High availability (HA) is critical for organizations that rely on continuous access to applications, services, and data. Whether supporting customer-facing platforms or internal […]

Read More