Thursday, March 14, 2013

Understanding Load Average in LINUX/UNIX/AIX


The term “load average” is used in many "Linux/UNIX/AIX" Operating Systems as a major utility.
Everybody knows that the numbers the term “load average” refers to, usually three numbers, somehow represent the load on the system’s CPU. In this post I’ll try making this three numbers clearer and understandable.

The easiest way to see the “load average” of your system is by "uptime" command.

It also appears in "top" command in Linux & "topas" command in UNIX/AIX. 

In all three cases the load average refers to a group of three numbers. For example, in the following output of "uptime",

10:41:47 up 5 days, 48 min, 1 user, load average: 0.82, 0.71, 0.66
The last three numbers are the “load average”. Each number represent the systems load as a moving average over 1, 5 and 15 minutes respectively. Now, the important thing is to understand what is being averaged, the load metric.

The metric that represent the load at a given point in time is how many process are queued for running at each given time (including the process that is currently being ran). Generally speaking, on a single core machine, this can be looked at as CPU utilization percentage when multiplied by 100. 

For example if I had a load-average of 0.50 in the last minute, this means that over the last minute half of the time the CPU was idle as it had no running process

On the other hand if I had load average of 2.50 it means that over the last minute an average of 1.5 process were waiting to their turn to run. So, the CPU was overloaded by 150%.

On a multi-core (Like Core 2 Duo, IBM Power Servers, HP Itanium Servers) systems things are a bit different, but in order to avoid unnecessary complications one can usually divide the load-average by the number of cores and treat the result as the load average of single core machine.

For example, let’s say the load average of a two-core machine was 3.00 2.00 0.50

This means that over the last minute we had an average of three runnable process (3.00), this means that one process, in average, was queued as there are two core in the machine that can run to process at a time. So the machine was overloaded had a load of 150% its capability. 

Over the last 5 minutes the load average of 2.00 means that we roughly had 2 process running each time, so the machine was fully utilized but wasn’t overloaded by work

Over last 15 minutes the load-average of 0.50 means that we could handle 4 time that load without overloading the CPU, we only had (0.50/2)*100=25% CPU utilization in that 15 minutes.
Hope now everybody is clear about "Load Average Term".

Nishith N.Vyas

1 comment:

  1. I wish this was the case. Unfortunately, at least on Linux, the load average also includes processes that are waiting on IO completions. This makes it essentially a useless metric for determining CPU utilization. You might want to check out this series of posts
    http://prutser.wordpress.com/2012/04/23/understanding-linux-load-average-part-1/

    ReplyDelete