White Paper: Multi-Cloud Explained
Use Cases, Risks and Best Practices
In the last decade, cloud computing has emerged as a major platform for computing deployments. Both AWS and Microsoft claim that large swaths of the Fortune 500 use their services, and both Google and Oracle have compelling cloud offerings as well. This has led many organizations, whether by design or by accident, to have workloads running in multiple clouds. This is known as a multi-cloud architecture. Accidental multi-cloud environments can happen when organizations have shadow IT organizations, or when they need specific functionality that only exists in a certain cloud.
Gartner noted in 2021 that 76% of enterprises indicated that they are using multiple IaaS providers. While this number is exceptionally high, compared to most small and medium businesses (SMBs), it is important to remember the scope of large enterprises—with many different departments and decentralized IT organizations. In some cases, a division may choose a specific feature or service at a certain cloud provider, while the bulk of the infrastructure for the same organization may remain in another cloud. For example, this may happen if a data science group decided to use Google Cloud Platform’s machine learning services, while the IT organization had deployed most of its infrastructure in Amazon Web Services EC2.
Highly regulated industries were some of the last to move into the cloud – often because of regulatory concerns. Cloud providers have worked extremely hard to meet regulatory and audit standards to support those types of industries. This process has met its goal – financial and healthcare companies have moved many of their workloads into the cloud. This trend was significantly accelerated by the COVID-19 pandemic.
However, that doesn’t mean concerns about regulation have completely gone away. Regulatory language can be vague at times. For example, the Financial Conduct Authority (FCA) regulations on outsourcing IT states that firms must be able to “know how they would transition to an alternative service provider and maintain business continuity”. This statement at least implies that regulated firms need to at least have a plan around a second cloud environment. Given the risk-averse nature of many heavily regulated firms, this concern has led many of them to deploy a multi-cloud strategy for their cloud deployments.
One other aspect of multi-cloud is mergers and acquisitions — this is typical at smaller firms. In the pre-cloud days, integrating IT systems and consolidating data centers after a merger or acquisition was a primary challenge. There are a number of factors that can complicate this, including existing contracts with cloud providers or co-location providers, which may require them to spend a certain amount of money on the non-favored provider. Just as in the physical data center world, consolidating cloud workloads is a larger effort that does not deliver significant business value, so it is frequently delayed for higher priority projects.
The other common type of multi-cloud deployment is using multiple clouds for high availability (HA) and disaster recovery (DR). In the evaluation of major public cloud outages across AWS and Azure, most outages are typically limited to a single cloud region at a time (and are most commonly software related). In most of these cases, the outage is limited to a single service—very few outages have taken down an entire region or cloud. To mitigate against these regional outages, mission-critical applications will typically be deployed to multiple regions within a single cloud provider. For this reason, it’s important to use HA clustering software that enables failover across both cloud AZs and regions.
However, more and more organizations have taken the added step of spreading their workloads across multiple public cloud providers. This can be much easier for static workloads, such as Websites and applications that can run independently of one another. For distributed systems that rely on coordination like databases and Active Directory, the networking aspects of multi-cloud DR become far more complex.
One of the reasons many cloud professionals shy away from multi-cloud architectures is the sheer complexity of deployment. While some cloud resources, such as VMs or storage are mostly the same across cloud providers, components such as networking and security options are very different, which can easily lead to misconfiguration. Another concern is that subtle differences in pricing models between providers can lead to outsized differences in your monthly bill. If you evaluate cloud pricing between the public cloud providers, you will typically observe that the prices of commodity services such as VMs and storage are extremely close across cloud providers. However, other services like networking, load balancing, and especially in platform-as-a-service offerings there can be major differences between providers. These differences can lead to outsized costs on your cloud bills. Let’s dig into some of those technical characteristics and challenges.
Networking for Multi-Cloud
While the implementation is different among the major cloud providers (and they each use their own nomenclature) the general concept of software-defined networking is quite similar across all clouds. In all cases, you have virtual networks, subnets, load balancing, and firewalls. For a successful multi-cloud deployment, you will need a virtual private network (VPN) in place to securely move your network traffic between clouds. You can use any virtual network appliance or the built-in VPNs available from each cloud provider.
Configuring the VPNs is straightforward, however, you should note how much bandwidth you will require and ensure that it is allocated. Note that choosing larger VMs or larger VPN gateways will cost more. For this reason, you need to use edge networking devices in each cloud that can perform compression and/or deduplication to reduce your networking costs.
While virtual machine storage is straightforward, understanding how you will store files and backups can be challenging. Much of the cloud cost savings are derived from replacing expensive file servers on SAN storage with low-cost, object-based storage like S3 or Azure Blob Storage. However, since object-based storage is typically accessed programmatically over APIs rather than direct OS calls, switching cloud providers for storage services is not trivia.
Implementation and Management of Multi-Cloud Environments
While technical challenges can be overcome with programming and smart design, management challenges are much harder to address. Whether you are securing access to cloud resources, ensuring consistent labeling of resources, or controlling costs, governance and management is the most important aspect of a successful cloud deployment. The challenge here is that you need staff that are sufficiently knowledgeable on not just one, but multiple cloud platforms to perform competent management in both environments. Finding workers who are highly skilled in a single cloud can be challenging, and even harder when looking at multiple clouds. In order to avoid some of these hurdles, some organizations may turn to solutions like VMWare, which abstracts away much of cloud management, or Kubernetes, which, in theory, is the same platform in any cloud. These methods come with either a financial cost and/or increased technical complexity.
Building VMs and even networking between two public clouds is something that a well-trained engineer can do in an afternoon. You can even do it with an infrastructure code tool like Terraform that is designed to work across platforms. However, the details like monitoring, alerting, billing, performance characteristics are all slightly different on each platform. A successful multi-cloud implementation requires thinking through all those details and building them into your cloud architecture and deployments.
Monitoring and Alerting
Monitoring and alerting across public clouds is one of the easier challenges, but it still requires careful thought. The cloud providers have their own built-in monitoring and alerting systems, but in a multi-cloud deployment, it can make sense to use a third-party monitoring tool, which can span clouds, and have a centralized repository. Even after choosing a tool—you need to decide where to host the monitoring solution, or even choose software as a service offering if that meets your monitoring needs.
While you have learned much about the technical challenges of moving into a multi-cloud model, the far larger challenge is hiring staff that has the technical skills to manage multiple clouds successfully. A recent search of LinkedIn for “Cloud Architect” roles shows a very large number of jobs, however, most of those jobs have 0 applicants. Cloud skills are in high demand, and finding and retaining talent is the biggest challenge. Most engineers will have skills in one cloud, which means to take a multi-cloud approach, you may need to hire twice as many engineers or consider an extensive training program.
Multi-cloud deployments are challenging. In addition to hiring the right people, which is probably the biggest challenge, there are many real risks to building out such a complex environment. The first is cost—any cloud bill has “hidden costs” that typically amount to 5-10% of your total bill. These are typically for monitoring services, outbound data egress charges, or small costs associated with services like load balancers or security services.
However, in every public cloud, there are a handful of services that can increase costs quickly. These services are charged according to usage-based pricing and can mean steep cost increases after only a few days. One way to mitigate this risk is to ensure you are taking advantage of the cost monitoring services and alerts that are in each of your cloud platforms.
The other ever-prevalent risk is security—the current level of threats with ransomware and nation-state actors means security is always front of mind for IT professionals in every field. While moving to a cloud platform generally improves the security footprint of an organization by providing easier access to services like encryption and easier network segregation, it can also make it easy to make mistakes. Network misconfigurations can be common — thousands of data breaches have been caused by improperly configured AWS S3 storage buckets. You can reduce these risks by carefully following the advice provided by your cloud provider’s security tooling. While the tooling is a good start, regular auditing and penetration testing of your environments should also be performed to reduce risks.
The other major risk of going multi-cloud is the sheer complexity of trying to failover an application between different infrastructure platforms. You always want to avoid a scenario where your high availability solution causes more downtime in your environment than a standalone solution. Early versions of SQL Server clustering presented this conundrum—in order to add disk space, you had to incur downtime that would not have occurred on a standalone solution. Multi-cloud failovers can bring such complexity—network services like DNS need a single point of truth which typically needs a single home somewhere. While failing over something like a static website can be trivial, moving a multi-tier application stack is extremely complicated in terms of networking and data synchronization.
While there are a lot of caveats to multi-cloud deployments, they can provide additional availability, especially in the event of a major cloud outage. If you are going to do a multi-cloud deployment, there are some best practices you can follow that will help you better manage and execute your deployments:
- Consistent tagging of resources, so that they can be easily identified, and costs reports for each application are easy to gather
- Following the best security practices for each cloud, and working with security resources to ensure you are not missing any vulnerabilities
- Evaluate third-party management tools to provide a centralized management plan for all of your services
Finally, engage consultants who are experts in each cloud, at least for your initial design and implementation phases. You will also need an HA vendor like SIOS Technology who understands the complexity of HA in each of the clouds. Experts can help you avoid downtime risks as well as financial and security pitfalls that can be common in these projects and speed up your implementation time.
While multi-cloud solutions are not for all organizations, many will go down this path whether for regulatory reasons, or improvements in high availability. Understanding networking and security are your biggest technical hurdles, and managing governance and costs are your functional challenges. Testing is key to ensuring the multi-cloud failback works. It is important to use an HA clustering solution that enables simple switchover and switchback and to understand how each of your applications will work with failover, and most importantly to regularly test that failover to understand any networking or data hurdles. Remember that downtime incidents happen suddenly and without warning. Ensure your clustering product allows you to test in real-world scenarios like these.