As a Customer Support organization, we hear from our customers all over the world every day. Customers call or email to open cases with us when they have questions or problems they need help with. Some of the cases end up being new problems and many cases end up not being new at all. Customers seem to run into the same issues over and over again. After 20 years of working in customer support and thousands of cases later, we still see new problems that have never been reported before and those fall into common categories as well. This keeps our work very interesting! One thing that we have noticed is that there are common categories that customer reported problems fall into.
Here are the top 5 reasons (root causes) that our customers reach out to us for help:
1. Network Problems: How to Plan Ahead and Avoid Downtime
Many times customers need to change the IP addresses in the cluster. Sometimes, the ramifications of making changes to the network configuration are not realized or planned ahead of time. When the network changes are made, issues can occur with the cluster that may not have been expected. If the IP address that changed is used in the DataKeeper and LifeKeeper configurations, such as a mirror endpoint or a communication path, then you need to make changes in the DataKeeper and LifeKeeper configurations so that the products are aware of this change.
Plan Ahead
If you know network changes need to be made, we recommend planning your network changes ahead of time. Planning ahead will avoid any unforeseen problem and ensure that you have defined steps to implement the changes.
Update Mirror IP Address
If an IP address (mirror endpoint) changes, DataKeeper will no longer be able to use the original mirror IP address (since it will no longer be there) and will not be able to mirror data between the servers. DataKeeper will need to be updated to use the new mirror IP address. This scenario is documented here.
2. Configuration Issues: Common Mistakes and How to Fix Them
Often, the root cause of the problem reported ends up being a configuration issue. Customers report that their configuration is not working correctly or the product appears to not be working properly from what they are seeing from the product GUI. Typically, configuration issues are a result of something that changed in the cluster environment from the original cluster configuration or something that was not setup correctly when the product was first installed.
Examples of common configuration issues reported:
- Some DataKeeper mirrors are not in a mirroring state
Many times customers need to expand/grow their volumes. One of the key product requirements is the source volume must be equal to or smaller than the target volume, otherwise the product will not be able to resync the data from the source to the target volume. While this may seem logical, it is often overlooked. Sometimes the target volume ends up smaller than the source and this leads to the volume not being able to reach a mirroring state. The following documentation and videos explain the procedure for expanding your DataKeeper volumes.
- DataKeeper cannot connect to servers in the cluster
When installing DataKeeper the user is prompted to enter the login credentials to be used by the DataKeeper service. A domain account with administrator privileges is recommended and most customers create an account specifically for DataKeeper to use. The domain account used must be added to the Local System Administrators Group. This account must have administrator privileges on each server that DataKeeper is installed on. Many times the account is not added to the Local System Administrators Group and this prevents DataKeeper from being able to connect to itself and other DataKeeper servers in the cluster. Refer to the documentation for more detailed information located here.
The majoring of the time Configuration issues require changes to be made to the cluster to get the DataKeeper or LifeKeeper products back to a working environment again.
We recommend reaching out to support before changes are made to the cluster environment so that we help ensure that you are headed in the right direction and point you to the documentation and videos that we have on the subject.
3. Upgrade Planning: Avoiding Disruptions in Your Systems
Upgrades are a common part of a system administrator’s tasks. There is always a need to upgrade something on your systems as new versions are released: the operating system, the application software, the system firmware, the database software, security software, etc. This can be overwhelming if there are multiple upgrades that need to be done on your systems.
Many customers reach out to Support when planning to upgrade DataKeeper or LifeKeeper and ask questions to make sure they understand the upgrade process before actually implementing the upgrade. This is what we like to see. We do see cases where some customers don’t reach out prior to performing upgrades and unexpected problems occur. Many believe that upgrades are routine; however, there are some upgrades that create incompatibilities and can cause issues.
Upgrade Planning
Planning is key with upgrades along with understanding what the specific upgrade entails. Ask questions before you perform the upgrade. Ensure that you have your steps documented before the upgrade. Don’t forget to perform the upgrades on test or QA systems prior to upgrading your Production systems. This is a best practice that we recommend so that if you run into issues with the upgrades, it will be on the test servers or the QA servers and not on your Production servers.
4. External or OS Related Issues: Troubleshooting Beyond the Software
What are external or OS related issues? We refer to root causes as external or OS related issues when the reported problem turns out to be something that is outside of the DataKeeper and LifeKeeper area. DataKeeper and LifeKeeper use many of the server components such as: disks/volumes and network. If the operating system cannot “see” the disk or volume, then DataKeeper and LifeKeeper cannot “see” the disk or volume either. At first glance, problems reported may appear to be DataKeeper or LifeKeeper related, however, when analyzing the issue it is determined to be an operating system component that DataKeeper or LifeKeeper depends upon.
For example, for a DataKeeper mirror to function properly, DataKeeper requires that the volume is visible to the operating system, on-line, healthy, and has a valid file system. If these requirements are not met, the DataKeeper mirror will not be able to mirror the data from one system to the other. DataKeeper will show that the mirror is in the Paused state. When debugging this problem, the Windows Disk Management tool for the Disk/Volume shows the volume is either off-line, not in a healthy state, or is a raw device. Once this is corrected, DataKeeper can mirror the data again from one system to the other. For more details refer to the video, Preparing Storage for DataKeeper Usage, located here.
Another example of an external or OS related issue occurs when the DataKeeper volume fails to lock on the target system. DataKeeper purposely locks the volume on the target system to prevent writes from occurring on the target system. In order for DataKeeper to lock a target volume, there cannot be an OS page file on the volume. Many times, systems are configured at the OS level to “Automatically Manage Paging Files” and sometimes page files end up getting placed on the DataKeeper volumes by the OS. To overcome this, we recommend that this OS setting be changed. Refer to this link for further details.
5. Performance: Improving System and Mirror Efficiency
Customers also contact us to improve their mirror performance and system performance with mirroring because the mirrors are not going into a mirroring state or the product is slowing down the performance of the system. The first issue (mirror not reaching a mirroring state) is simply a matter of tuning registry keys in DataKeeper to match your system configuration using Tunables such as WriteQueueHighWater, WriteQueueHighWaterSynchronous, and BlockWritesonLimitReached are several commonly changed tunables. Refer to the documentation for these tunables located here.
The second issue (performance of the system) is simply a matter of moving the location of the DataKeeper bitmap. By default the bitmap is located on the C drive and may need to be relocated to a faster drive. Refer to the documentation and video for information on relocating the bitmap here.
System and product tuning is often done to maximize performance. Examples of these changes include changing the product tunables to more closely match with the customer’s environment. There are many things that can affect DataKeeper and LifeKeeper including the operating system, network, storage devices, etc. DataKeeper and LifeKeeper use default settings that may need to be tuned to the customer’s specific environment. We do offer Validation and Health Check Services to help customers ensure that HA best practices are implemented. Visit this link for details on our offerings.
A key strategy that we recommend is to ensure that testing is completed prior to going into production so that problems, including performance issues, are found and resolved earlier in the process. Testing is often done in a test or QA environment prior to going into a production environment. It is always best to try to simulate the production environment load on a test / QA environment to ensure that the production environment will perform sufficiently. We recommend reading several of our blogs on performance located at our blog and specifically at here.
Ensure your systems run smoothly by staying ahead of these common issues. Need expert guidance? Contact our support team today to help you prevent future support calls!