In Linux OS, the load average is dependent on multiple factors like CPU utilization, number of uninterruptible sleep processes, and others. On the other hand, Solaris OS load average is dependent on the number of runnable and running processes.
I have faced a high load average of about 250 (1min), 230 (5min), 219 (15min) on Solaris bare metal machine having the resources as
vCPUs: 256
RAM: 512GB
DISK: SAN
During this high load, I figure out that the CPU idle time (87% idle) and the free RAM was is about >100GB. It completely shows that there is no any bottleneck at these two resources. As the backup process running on the system and a lot of reading requests are going to happen on my SAN filesystem but the response time is about 0.25-0.35ms which is far less or good. From the iostat utility, I found that those filesystems were about 40-50% busy. From vmstat, after a few seconds, there are about 40-120 runnable processes means they get into runqueue but in the next second the value is 0. From the graphs and stats, it was visible that the read request caused this issue.
Questions:
- Does these runnable processes could cause such a high load average? If its, so How
- If the read request response time from SAN is much good and filesystem busy value is 50 percent not 100 percent then why this causing load? How can it be relatable.
Note: In case of any issue regarding this scenario please let me know.