Seven Essentials in HA Team Transition (Navigating the Great Resignation)

Quickstart HA
Reading Time: 6 minutes

Unless you’ve been under a rock or frozen in time you’ve likely heard from one source or another that employers and employees are in the midst of a trend being called  “The Great Resignation”.  As reported in US News and World Report, “According to the U.S. Bureau of Labor Statistics, 4 million Americans quit their jobs in July 2021 and the trend isn’t slowing down.”  No matter your company size or current revenue stream, if it hasn’t already, this trend will impact your IT team in the near future.  Yes, let that sink in.  The same team that is responsible for ensuring your mission-critical application availability is vulnerable in one way or another to the effects of “The Great Resignation.”  

So, how do you recognize the warning signs, come to terms with the reality, and navigate with empathy and clarity through “The Great Resignation” so that it doesn’t cause a “Great Disaster” for your critical applications?

Here are technical and non-technical tips for sound High Availability (HA) best practices in the midst of change:

1. Don’t Quit

Don’t quit.  Seriously!  As colleagues and good people are choosing to change jobs, careers, or otherwise leave the workforce it can be tempting to quit.  Especially when you begin to consider the prospect of carrying your already heavy workload with an even shortened bench.  But don’t quit. 

 2. Identify Key Risks to High Availability

Of course this process of identifying risks is two-pronged. After a resignation, your team is at risk from further personnel changes.  But, your HA is also at risk due to a loss of capacity, technical knowledge, or expertise.  To prevent your enterprise from experiencing unplanned downtime in the wake of new team resignations, you’ll need to identify key areas of risks.  Some technical risks include:

  1. Cloud expertise and knowledge
  2. Database Administration
  3. Storage Administration and configuration
  4. Tacit HA product knowledge
  5. Emergency Coverage (Staffing)
  6. Technical Leadership
  7. Documentation

3. Managers: Assess Your Company

Many times as people begin to leave a company, it is very easy to say that it is “them, not us!”  We want to focus on all the reasons why their issues led to them leaving, quitting, or choosing a different career or job.  It is quite possible that their reason for leaving is entirely personal, however sometimes, the issue is in the mirror and it is not them, but us.  Why does figuring out whether it is a problem with them or you matter for HA?  Well, if the problem is with your company, such as it’s mission, vision, culture around HA and IT, or hiring and staffing issues for IT and HA system management, then simply adding an additional headcount will be a temporary fix.  In addition, the risks to the team morale, commitment, and knowledge transfer may be further eroded as the focus remains on blame shifting versus issue resolution.

4. Team Leads: Assess Your Team

Almost every company has had someone quit their team over the past two years.  No matter whether they were seeking higher pay, staying at home to care for family members, retiring or pursuing other options, they have left.  If you’ve lost a team member, it is essential to assess the remaining team.  This assessment will be both technical and non-technical in nature.  Technically, you will need to:

   a. Identify current skills, abilities and knowledge gaps

What skills are remaining on the team, and what is the level of technical expertise and ability? Where are the knowledge gaps between, especially those between theory and practice?

   b. Understand both existing and missing roles.  

Many of your team members may be covering multiple roles and responsibilities.  The loss of a single team member may actually mean the loss of coverage for multiple roles and responsibilities. 

   c. Evaluate immediate training or augmentation needs

Where are you covered, but needing additional training to stabilize and solidify the team? What areas do you lack coverage that can be mitigated by training of existing personnel or some form of contract professional services?  As VP of Customer Experience, see this firsthand. Our team recently worked with a company needing professional services after losing key team members responsible for their HA environment.   

Non-Technically, you will need to:

   a. Understand how remaining team members feel

Even prior to the COVID pandemic and period of “The Great Resignation,” many teams were running on fumes. A 24/7 world of HA leaves a lot of work to be done with normal team numbers, norms, and tasks.  If your team has been impacted, it is as critical as a down production server to check in and listen to the stories of remaining team members.  Find out who is depleted, burned out, confused, nearing a collapse or conversely, full alive and ready for a new challenge. Be sure to listen to verbal and non-verbal cues, empathize (not just with the loss of a colleague, but with their emotions, concerns, and fears).  

   b. Understand the reasons that the remaining team members are still on board

 Knowing how team members feel is both a technical and non-technical necessity, but nearly equal to this task is discovering their reasons for staying.  Of course, some reasons may surprise you.  Author and speaker Carey Nieuwhof states that some team members are only staying because they “feel trapped on the team because they didn’t leave first.”  Other reasons team members stay may not surprise you, but regardless of the reason, comfort, opportunity, salary, location, stock options, passion, teamwork, culture, all of the reasons your team members stay for are important.

   c. Evaluate the impact of being short-handed 

There is obviously a technical component of being short handed previously discussed; assessing skills gaps, etc.  But there is a corollary to the technical assessment of being short handed, and that is non-technical.  Be sure to assess and evaluate the impact that being short handed, even if only momentarily, will have on the mental, emotional, and personal health of remaining team members.  Early in my career as a manager, our team dealt with a downsizing event that left several employees emotionally vulnerable and mentally exhausted.  This led to higher fatigue, more mental fog, and increased rates of defects and mistakes by those team members.  If your team is severely impacted mentally and physically by being short-handed, the risk to your HA could increase.  Your team may be scrambling to pick up the slack, and they may rally quickly to cover for the leader or team member who has resigned, but it is critical that you understand if those who remain are also exhausted, feeling trapped, or at risk to leave.

5. Identify the Critical Technical Tasks, Priorities and Assign Responsibilities

Years ago, a senior executive left the company.  Despite having transitioned his roles and tasks throughout nearly a year of transition, there were still roles and tasks that surprised the remaining staff.  In today’s wave of resignations you don’t have a full year of transition.  Furthermore, if your team has experienced more than one resignation, you probably haven’t completed the analysis and transition of the first person so it is very critical to identify and prioritize the most critical tasks, and assign responsibilities.   Be sure to list out tasks such as: security scans, updates, maintenance, backups, tests, new application deployments, cost analysis, cloning and redeployment of images, patch application, and vulnerability remediation.  These tasks will all remain necessary despite the losses and can have devastating effects if left to linger. 

6. Make a Short-term Plan for Maintenance and Operation

Tasks, roles and responsibilities still need to be covered.  Critical issues will need to be addressed.  Unplanned downtime will not wait to happen after you have rebuilt your staff, trained existing personnel, and fitted your company to be more resilient to the transitions and changes of the Great Resignation.  In order to navigate in the short term, you will need to develop a smart, realistically achievable short term plan.  This plan should map out the procedures, tasks and processes identified so that maintenance and operation can continue.  Furthermore, it should define how existing critical infrastructure policies can be managed carefully through the tumultuous seasons to come. 

7. Focus on the Future

The previous steps have led up to this.  With an assessment of the current team, and identification of your key risks, and a transition plan in place the next step is to focus on the future. You still have a mission.  You still have critical applications that need to be highly available.  You still have data that needs to be protected, mined, replicated, and available for your business.  Start making plans for the future team.  

  1. Build roles and responsibilities.  
  2. Update architectures and documentation
  3. Evaluate opportunities for growth and alignment
  4. Plan for new hires, including time for onboarding 
  5. Allocate time and resources to creating and updating onboarding materials
  6. Focus on team health
  7. Apply risk mitigation strategies for the near term and plan for the long term

Not all of the news about “The Great Resignation” is bad news for your team and HA.  In the wake of team members leaving for new or different positions and opportunities, you have a real and rare opportunity to take all the information of your assessments and turn them into tools for growth and alignment and a better HA future.  Building this brighter future includes defining the duties, roles, and skills needed, updating architectures and designs, planning for new hires and services engagements, and focusing on building a healthier team.

I discussed this subject in more detail in this recent TFir interview. 

-Cassius Rhue, VP, Customer Experience


Recent Posts

Step-by-Step – SQL Server 2019 Failover Cluster Instance (FCI) in OCI

Introduction If you are deploying business-critical applications in Oracle Cloud Infrastructure (OCI), it’s crucial to understand and leverage the availability SLA (Service Level […]

Read More

Four tips for choosing the right high availability solution

High Availability and Lebron is the Greatest Of All Time (G.O.A.T) Debate I was losing at Spades.  I was losing at Kahoot.  I […]

Read More

Disaster Recovery Solutions: How to Handle “Recommendations” Versus “Requirements”

Let’s say you experience an issue in your cloud cluster environment, and you have to contact one of your application vendors to get […]

Read More