Fifty Ways to Improve Your High Availability

Improve High Availability
Reading Time: 2 minutes

I love the start of another year.  Well, most of it.  I love the optimism, the mystery, the potential, and the hope that seems to usher its way into life as the calendar flips to another year.  But, there are some downsides with the turn of the calendar.  Every year the start of the New Year brings ‘____ ways to do_____.  My inbox is always filled with, “Twenty ways to lose weight.”  “Ten ways to build your portfolio.”  “Three tips for managing stress.”  “Nineteen ways to use your new iPhone.”  The onslaught of lists for self improvement, culture change, stress management, and weight loss abound, for nearly every area of life and work, including “Thirteen ways to improve your home office.”  But, what about high availability?  You only have so much time every week. So how do you make your HA solution more efficient and robust than ever.  Where is your list?  Here it is, fifty ways to make your high availability architecture and solution better:

  1. Get more information from the cluster faster
  2. Set up alerts for key monitoring metrics
  3. Add analytics.  Multiply your knowledge
  4. Establish a succinct architecture from an authoritative perspective
  5. Connect more resources. Link up with similar partners and other HA professionals
  6. Hire a consultant who specializes in high availability
  7. 100x existing coverage. Expand what you protect
  8. Centralize your log and management platforms
  9. Remove busywork
  10. Remove hacks and workarounds
  11. Create solid repeatable solution architectures
  12. Utilize your platforms: Public, private, hybrid or multi-cloud
  13. Discover your gaps
  14. Search for Single Points of Failure (SPOFs)
  15. Refuse to implement incomplete solutions
  16. Crowdsource ideas and enhancements
  17. Go commercial and purpose built
  18. Establish a clear strategy for each life cycle phase
  19. Clarify decision making process
  20. Document your processes
  21. Document your operational playbook
  22. Document your architecture
  23. Plan staffing rotation
  24. Plan maintenance
  25. Perform regular maintenance (patches, updates, security fixes)
  26. Define and refine on-boarding strategies
  27. Clarify responsibility
  28. Improve your lines of communication
  29. Over communicate with stakeholders
  30. Implement crisis resolution before a crisis
  31. Upgrade your infrastructure
  32. Upsize your VM; CPU, memory, and IOPs
  33. Add redundancy at the zone or region level
  34. Add data replication and disaster recovery
  35. Go OS and Cloud agnostic
  36. Get training for the team (cloud, OS, HA solution, etc)
  37. Keep training the team
  38. Explore chaos testing
  39. Imitate the best in class architectures
  40. Be creative.  Innovation expands what you can protect and automate.
  41. Increase your automation
  42. Tune your systems
  43. Listen more.
  44. Implement strict change management.
  45. Deploy QA clusters.  Test everything before updating/upgrading production
  46. Conduct root cause analysis exercises on any failures
  47. Address RCA and Closed Loop Corrective Action reports
  48. Learn your lesson the first time.  Reuse key learnings.
  49. Declutter.  Don’t run unnecessary services or applications on production clusters
  50. Be persistent.  Keep working at it.

So, what are the ideas and ways that you have learned to increase and improve your enterprise availability? Let us know!
-Cassius Rhue, VP, Customer Experience

Recent Posts

5 Retail Challenges Solved with a Robust HA/DR Solution

The retail industry is constantly evolving, driven by changing consumer behaviors and advancements in technology. Retailers rely on critical databases such as SQL […]

Read More

Service Level Agreements and the Four Nines are Not Enough for High Availability in the Cloud

When most people think of high availability, they set four nines (99.99%) or less than five minutes of downtime every month as the […]

Read More

Why SIOS HANA Multitarget Automation is a Bigger Deal Than you Think

Larry (not his real name) was a SIOS customer who had deployed a replication solution for high availability and disaster recovery (HA/DR) in […]

Read More