Date: April 2, 2021
Tags: Cloud, disaster recovery, High AvailabilityReading Time: 5 minutes
Author Carey Nieuwhof hooked me with a blog topic of the biggest trap for 2021. While not directly speaking to HA, the topic alone made me reflect on some of the trends of 2020. Cloud innovations are numerous and begin at the most fundamental levels of the infrastructure. Not to mention advances in AI, machine learning, compute capacity and algorithms, memory management and sharing, and a battery of others. All of these advances add up to making the current generation cloud the most robust, reliable and available data center. These centers, optimized with redundant power, cooling, a legion of IoT devices for monitoring and alerting, redundant networking, high speed interconnects, massive servers, storage, and disks are impressive– and quite possibly the biggest trap that may be looming in 2021.
The biggest trap of 2021 will be believing that cloud availability alone is the same as or enough for higher availability. This is a complex trap to dissect. The list of named advances that make up the backbone of many data centers is indeed vast and impressive, and it is only a fraction of the technological innovations that exist driving the cloud. So, what makes this massively redundant, high capacity, and AI driven infrastructure a trap? Namely, that hardware and infrastructure availability still leave your enterprise at risk.
The Top Risks of Cloud High Availability
Disks. Disks have gotten faster and more intelligent. New eye-popping advances in chip sets, access technology, manufacturing, storage capacity and raid technology means that cloud vendors are able to put up gaudy numbers for speed, access, and redundancy. This reduces the risk for single points of failure (SPOF’s) for the disk infrastructure and provides confidence that a single disk, or even a momentary loss of power to the disks will not cause a lack of availability.
Storage Arrays. The storage arrays and enclosures housed within the data center providing access to the disks have also greatly improved. No longer the big eye soar of blinking lights and Airboat sized fans, these units are small in size but loaded with capacity and performance enhancements. You’ll be hard pressed to find a modern chassis that isn’t built with redundant power, redundant disk capabilities, and able to provide near zero replication between connected storage units, even between units that are dispersed at greater distances. In addition these units have added the benefits of AI to predict failures, proactively resolve problems, and optimize workloads to reduce performance bottlenecks.
Servers. Remember when it seemed so long ago that big name manufacturers and tech prognosticators were predicting game changing technology that would reshape the landscape of the future. It seems like decades ago when people were predicting server technology advances such as: reduced footprint, faster more complex chipsets, NVMe, battery efficiencies, cooling advances, storage advancements, in-memory and persistent memory advances, GPUs and bare metal provisioning. That future has arrived and been surpassed. Servers are now accelerating the pace of cloud computing capabilities and increasing the ability of the cloud to promote redundancy, reliability and robustness.
Networking. Advances in the networking solutions, tools, software and equipment also make the list of things that make cloud availability stronger in 2020. Over the last few years, vendors have released solutions that have expanded the speed, possible topologies, capacities and distance capabilities of inter- and intra- cloud networks. Like so many other technologies, vendors are automating traffic flow and patterns using AI and Machine Learning, taking advantage of advances in manufacturing to build in device redundancy that can be leveraged for availability and reliability.
Applications. Applications are still a vulnerable part of the cloud architecture when left unprotected. Applications that are not protected by an application aware higher availability module or framework, or SIOS Application Recovery Kit (ARK) run the risk of being down at the most critical time or moment in your business lifecycle. A SIOS ARK provides the application in the cloud with critical application aware monitoring and recovery, as well as failover and disaster recovery orchestration in the event of a failure.
Databases. While numbers of databases have increased their robustness, and some have even jumped in to offer replication enhancements, these databases are still a risk on their own. Databases with replication technology still need orchestration, automation, and the intelligence to make sure that they are highly available to the application components that need them. What good is it if your database continues to hum along in your primary Region and Availability Zone, if your application has actually failed to a different Region or DR site. Supplement databases with replication, such as the SAP HANA database, with the automation and best practices of the SIOS Technology Corp HANA ARK and the SAP certified SAP S/4 HANA ARK. Protect databases that do not have replication technology, or whose technology is limited with the combination of the SIOS Protection Suite, SIOS DataKeeper for Linux and the associated ARK.
Storage. In the realm of disks and storage it can be intriguing to believe that the capacity, redundancy of software and hardware raid mean that you are highly available. However, storage is only available if it is accessible to the applications and Virtual Machines that need them. What technology do you have deployed to monitor and recover mounted cloud shares and volumes such as EFS and ANF. An unplanned downtime and its associated chaos can be as near as an unintended unmount, or offline operation by a well-intentioned user.
Virtual Machines. Hypervisor technology has made your virtual machine push button easy. Integrated cloud solutions promise to monitor if the VM is available and provide options such as restart or migrate. These solutions are not enough to cover issues with your Virtual Machine that may stall, delay, or degrade your availability. In addition to what your cloud vendor provides, you need a monitoring and availability solution that understands how to monitor the VM health such as:
- Disk capacity.
- CPU deadlocks
- Memory contention and errors
- Resource exhaustion
A VM that runs without the ability to process applications requests may escape the eye of your cloud only monitoring, but shouldn’t escape the watchful monitoring of your higher availability solution.
Data Center. Let’s get real for a moment. All the advances in data center availability, redundancy and reliability does not negate the need for eliminating your data center as a single point of failure (SPOF). As VP of Customer Experience, we have worked with a customer who deployed best in class redundancy within the private cloud data center, much like the major public cloud vendors. And if not for the high availability and data replication solution provided by SIOS Technology Corp, this customer would have experienced major downtime when a tropical storm ripped through their area taking out power, backup generators, cooling, and networking.
However, with SIOS Technology, the customer was able to preemptively failover ahead of the storm to a data center more inland. Cooling failures, construction mishaps, as well as human and natural disasters are continual reminders that a single data center isn’t the same as higher availability.
Don’t fall into the biggest trap of 2021. Make sure you have true high availability by avoiding thinking the cloud has you covered.
– Cassius Rhue, VP, Customer Experience