How to diagnose cloud performance issues

Follow these five steps to detect root causes of poor cloud performance

How to diagnose cloud performance issues
Free-Photos (CC0)

Is your public cloud-based workload too slow? You don’t know where to look first?  Here are some quick guidelines for diagnosing the root cause of most performance issues.

I’ve found that many people in IT who can quickly diagnose issues with traditional systems have trouble diagnosing cloud-based system. Why? Because they don’t have a deep understanding of what’s in a public cloud, such as Amazon Web Services or Microsoft Azure, and believe that it’s a black box.

That’s really not the case. Plus, the system management tools and APIs that most public clouds provide are first-rate. However, you do have to understand where to look first, and what tools to use.

Cloud performance is complex, because it’s a complex distributed system at the end of the day. However, follow the five diagnoses steps below to find and fix root causes. If you find performance issues at one step, don’t stop there! You may have more than one issue affecting performance.

1. Check the infrastructure that supports the workloads, both application and data

Using system monitoring and log analysis tools, you can determine CPU and storage utilization, which are the most likely culprits.

Many IT pros using clouds fail to allocate more CPUs and storage as needed as an application's and database’s size expand over time. Although you would assume that a public cloud automatically expands to meet your needs, that’s not the case. You need to configure and provision more servers to handle to additional workload before they are needed.  

2. Look at the applications themselves

There are many monitoring tools that can peer into applications, and I strongly recommend that you use one or more of them.

Applications are the culprit for poor performance almost as often as the infrastructure is because they may not have been refactored or modified to used cloud-native features. Thus, they can become very inefficient at using the infrastructure, which falsely puts the performance blame on the infrastructure.

3. Look at other less likely root causes of performance issues

Now it’s time to check other components. Check the security system: Encryption services can saturate storage and compute. Check the governance services—even the monitoring services that will tell you about performance issues in the first place. I’ve found that all such tools can oversaturate the infrastructure.

4. Move to the network, including bandwidth checks inside and outside the cloud

Because you consume public cloud services over the open internet, you’re often competing with lots of other packets. To see if that’s a cause of your poor performance, do ping tests, as well as data movement up and down, using tests that approximate what’s transmitted and consumed by the cloud-based workloads.

5. Examine the users’ browsers and computers

Finally, there are often issues with the users’ browsers that interact with the cloud-based application.

I’ve found malware, encryption issues, and basically all of the stuff that can go wrong with Windows PCs and Macs can make the cloud performance become slow at the client side. Have tech support run those down if the first four steps come up clean.

Copyright © 2017 IDG Communications, Inc.