What is Disaster Recovery?
Disaster Recovery is Critical to Continued Business Operations
Disaster recovery (DR) is a strategy and set of policies, procedures, and tools that ensure critical IT systems, databases, and applications continue to operate and be available to users when a man-made or natural disaster happens. While the IT team owns the disaster recovery strategy, DR is an important component of every organization’s Business Continuity Plan, which is a strategy and set of policies, procedures, and tools that get the entire business back up and running after a disaster.
But when we speak of a disaster, it does not need to be a full-fledged hurricane, tornado, flood, or earthquake that impacts your business. Disasters come in many forms, including a cyber-attack, user error, fire, theft, vandalism, even a terrorist attack. In short, a disaster is any crisis that results in a down system for a long duration and/or major data loss on a large scale that impacts your IT infrastructure, data center, and your business.
In a recent Spiceworks survey, 59 percent of organizations indicated they had experienced one to three outages (that is, any interruption to normal levels of IT-related service) over the course of one year, 11 percent have experienced four to six, and 7 percent have experienced seven or more. In addition, the survey also indicates that larger companies, which rely on a greater number of services, are more likely to experience outages than smaller organizations. For example, 71 percent of small businesses experienced one or more outages in the last 12 months, compared to 79 percent of mid-size businesses, and 87 percent of large businesses. When you look at those statistics, you know you are living on borrowed time if you don’t have a disaster recovery plan in place.
But there is good news. Compared with statistics from previous years, it appears that organizations of all sizes and from all industries are doing better when it comes to having a disaster recovery plan in place. According to the same Spiceworks survey, 95 percent of organizations have a DR plan in place but unfortunately, 23 percent never test or exercise their plan. Exercising your DR plan is as important as a student fire drill or muster drill. Having a plan in place is just the first step. If the people involved in executing the plan don’t know what to do, you won’t be able to recover from a disaster.
High Availability Vs. Disaster Recovery
But before we go any further, let’s be clear on the difference between best practices for handling a system failure versus a disaster. To recover from a system failure, redundant systems, software, and data should be on your local area network (LAN). For critical database applications, you can replicate data synchronously across the LAN. This makes your standby instance “hot” and in sync with your active instance, so it is ready to take over immediately in the event of a failure. This is referred to as high availability (HA).
However, to recover systems, software, and data in the event of a disaster means redundant components must be on a wide-area network (WAN). With a WAN, data replication is asynchronous to avoid negatively impacting throughput performance. This means that updates to standby instances will lag updates made to the active instance, resulting in a delay during the recovery process. Since disasters are rare, some delay may be tolerable and is dependent upon (a) how critical it is to your business to achieve the lowest possible Recovery Time Objective (RTO) and Recovery Point Objective (RPO) and (b) how much budget you can allocate to achieve the best RTO and RPO.
RTO is the maximum tolerable duration of any outage and RPO is the maximum amount of data loss that can be tolerated when a disaster happens. For disaster recovery, RTOs of many minutes or even hours is common with some solutions as it is too expensive to try to recover across a WAN in just a few minutes. For mission-critical applications, your organization will want to achieve a low RPO but the lower your RPO, the more you need processes in place to ensure all data has been replicated on the standby server before failover. This effort tends to increase recovery time.
But with SIOS disaster recovery solution, you can achieve a minimal-to-no-data-loss RPO and an RTO of one to two minutes.
SIOS Delivers One Solution to Meet Your HA and DR Needs
|Whether you need local HA within a single site or fast, efficient DR across multiple sites, SIOS solutions meet all your business continuity needs.|
The SIOS disaster recovery solution is a multi-site, geographically dispersed cluster that provides RPOs of seconds and RTOs of minutes. What makes SIOS different than many other DR providers is that it offers one solution that meets both high availability and disaster recovery needs.
To support DR, you configure your clusters the same way as you do for high availability but with two distinct differences previously discussed:
- The DR cluster node(s) are in a geographic site – on-premises, virtual, or in the cloud – that is further away from the HA instance.
- The DR site is on a wide-area network (WAN), which means that data replication will be asynchronous to avoid negatively impacting throughput performance.
Remember, asynchronous data replication means that updates to the DR instances will lag updates made to the active instance but typically only by a few seconds at most. But with SIOS’ incredibly fast data replication across the WAN, you can keep real-time copies of data synchronized across multiple servers and data centers to achieve both HA and DR.
In addition to one single solution for HA/DR and real-time data replication, the SIOS HA/DR solution also provides:
- Block-level data compression to minimize network loads
- Bandwidth throttling to regulate and minimize network congestion
- WAN optimization to improve network performance
- Integration with push-button failover to support DR and automatic failover to support HA
- An agnostic platform approach, allowing you to choose on-premises, virtual, cloud, or a hybrid DR solution
The following case study showcases the use of SIOS DataKeeper to deliver HA and DR in a single solution.
Enabling HA and DR Protection at a Premier Health Center
ALYN Hospital, located in Israel, is a premier pediatric rehabilitation health center, specializing in diagnosing and rehabilitating infants, children, and adolescents with physical disabilities. Parents bring their children from Israel and abroad to receive a wide range of medical services, paramedical therapies, and additional state-of-the-art rehabilitation services.
The Search for the Right Solution
ALYN Hospital operates a variety of applications – including electronic medical records (EMR), customer relationship management (CRM), SQL Server databases, Microsoft Exchange, and Microsoft Office in support of its clinical and administrative operations. As a healthcare provider, the hospital is subject to strict government regulations and needed to implement strong DR provisions to ensure the protection and availability of their mission-critical applications. The hospital chose Hyper-V Replica to support its DR strategy, operating two, physically separated server rooms on-premises, enabling all critical virtual machines (VNs) running on any Hyper-V host server to be replicated to another in the other room. Unfortunately, this configuration was not satisfying RPO and RTO requirements, so the IT team started to investigate other options.
In looking for the right DR solution, the IT team considered Windows Server Failover Clustering (WSFC), which uses shared storage. Unfortunately, ALYN did not have a SAN in place and because of budget restrictions, it was cost-prohibitive to implement identical SANs in both server rooms. For this reason, ALYN investigated third-party solutions.
In its search for third-party failover clustering software, ALYN established three criteria:
- The solution had to work with existing hardware.
- The solution had to provide both high availability (HA) and disaster recovery (DR) protection across all hospital critical applications.
- The total cost of ownership (TCO) had to fit within the department’s limited budget.
SIOS DataKeeper – The Obvious Choice
After evaluating several different solutions, the IT staff chose SIOS DataKeeper, which the team described as a solution “that delivers carrier-class capabilities with a remarkably low total cost of ownership” and delivers HA and DR in a single cost-effective solution.
SIOS DataKeeper combines real-time, block-level data replication with continuous application-level monitoring and flexible failover/failback policies in a total solution that is easy to implement and manage. DataKeeper leverages WSFC and maintains compatibility with the operating environment, making it easy for the IT team to quickly learn how to use the solution and quickly complete HA configurations for all applications.
With DataKeeper, the IT team can create three-node SANless failover clusters with a single active instance and two standby instances. With this configuration, ALYN can continuously update systems and software without disrupting operations because the active instance can be moved to any server in a three-node cluster and remain fully protected during periods of planned hardware and software maintenance.
In addition, SIOS can work with any type of storage and WAN-optimized data replication, which simplifies the implementation of ALYN’s remote DR site. To maintain high transactional throughput performance, data replication across the WAN occurs asynchronously but SIOS DataKeeper employs special techniques to optimize data transmission, allowing ALYN to achieve demanding RPOs and RTOs.
The Bottom Line
Today, SIOS DataKeeper is providing high availability protection for all of ALYN Hospital’s essential applications. Comments Uri Inbar, ALYN Hospital IT Director, “With SIOS we found a solution that delivers carrier-class capabilities with a remarkably low total cost of ownership. For us, it was an obvious choice.
ALYN Hospital tests the configuration regularly, and routinely changes the active and standby designations, while redirecting the data replication as needed during planned software updates. The applications continue to run uninterrupted.