Tuesday, January 24, 2017

Performance Problem Troubleshooting Guideline



Introduction

This guideline is to define general steps we need to follow when deal with performance problem.

Performance Problem

What is performance Problem? We will deem there is a performance problem when we found any of following symptoms
1.      CPU usage is over 90% for more than 120 seconds.
2.      CPU usage is over 90% on average.
3.      Memory usage is over 80% for more than 120 seconds.
4.      Memory usage is over 80% on average.
5.      Open a normal webpage takes more than 15 seconds.
6.      Open a complicate webpage takes more than 60 seconds.
7.      Got “Internet Explorer cannot display the page” error when try to open a page.
8.      Got “Timeout expired” error when try to open a page.
9.      Application is very lagging in general.

Performance Problem Confirmation

Before we start troubleshooting, we need to confirm it is a real performance problem instead of false positive by trying any of followings
1.      Try to reproduce the performance problem by browsing to the page that is reported can’t be opened or take a long time to open. Write down the steps.
2.      Check CPU usage. Write down the percentage of CPU usage and the length of time it last.
3.      Check memory usage. Write down the percentage of memory usage and the length time it last.

Troubleshooting Guideline

General Information Collecting

In order to troubleshoot a performance problem, we need to know

Client Computer
1.      Operation system type and version (Windows version)
2.      CPU type, frequency, and number of cores
3.      Memory size
4.      Disk space
5.      Browser type and version

Application Server
1.      Operation system type and version (Windows version)
2.      CPU type, frequency, and number of cores
3.      Memory size
4.      Disk space
5.      Number of web applications installed

Database Server
1.      Operation system type and version (Windows version)
2.      CPU type, frequency, and number of cores
3.      Memory size
4.      Disk space
5.      Number of databases

We also need to ask
1.      Can we reproduce this problem?
2.      When is this problem started?
3.      Is this problem happened constantly or once a while?
4.      Is this problem happened to certain people or to everybody?
5.      Is this problem happened after a specified operation or action?
6.      Is there is major change recently?

Troubleshooting

If problem is reproducible, then we need to reproduce the problem and
1.      Run SQL Server Profiler to capture database activates. And then analyze trace file.
2.      Check which process has highest CPU usage and/or memory usage.
3.      If w3wp process has the highest CPU usage and/or memory usage, then run memory dump tool to dump w3wp process when CPU reach 90% or memory is over 80%. And then analyze dump file.

If problem is not reproducible, then
1.      Run SQL Server Profiler for a whole day to capture a whole day activities. And then analyze trace file.
2.      Setup memory dump tool to automatically capture full memory dump when CPU reach 90% or memory is over 80%. And then analyze dump file.


If we come across the moment of CPU reaches 90% and memory is over 80%, then we should immediately
1.      Run SQL Server Profiler to capture the current activities. Andy then analyze trace file.
Check which process used the most CPU and/or memory, and capture a memory dump immediately to that process. And then analyze dump file.

No comments:

Post a Comment