Score:0

How to troubleshoot extremely high load averge (over 550)?

tf flag

I am Ubuntu 18.04 user, running in OVHCloud Datacenter for 3 years with PHP Application and Mysql server on it. For 2.5 years, everything was ok, but in last 6 months i have strange issuee:

  • Every 4th Sunday, exactly at 10.30 pm, my Load Average goes extremely high, over 550, and my server crashes.

When this happens I am able to login over ssh, i am able to stop all services, but nothing fixes load average. As soon as I reboot it, it works again for 4 weeks, then the issue re-appears.

Can someone help me how can i troubleshoot what is causing extremely high CPU load average, and why exactly always at the same time?

Please take a look at this picture from htop:

Htop

Thanks

ru flag
what's running on the server? What is web facing? High CPU load suggests that *something* is trying to run insanely high intensity processes or queries and since you're showing us `htop` after you've shut services we can't reliably ID what is or isn't doing the overuse of resources. I notice MySQL there, any chance you're running a website that isn't updated/patched properly and as such there's some type of SQL injection based attack mechanism being used?
MaxIT avatar
tf flag
how can i check is it mysql injection?
Terrance avatar
id flag
See if there is a cronjob sitting in the `/etc/cron.monthly/` directory.
Score:1
pl flag

If it's not the host at OVH doing something crazy, but something within your machine, then it's likely a cron job.

I'd be inclined to look in /etc/cron.d for any files which have jobs which are configured to start on that day at that time.

You can probably grep "^30 22" /etc/cron.d/* to look for any lines starting with (^) 30 (minutes) past 22 (hours). Or you could just go through each one. I suspect it's some badly configured job in there.

Alternatively it could be something in your crontab crontab -l or root's crontab sudo crontab -l.

MaxIT avatar
tf flag
I already checked cron's, they are empty. Since time interval is always same, Sunday at 22:30, is it possible that Datacentar byitself is doing some thing like backup which screws my hdd and creates high load?
pl flag
Is it shared hosting? Could be the provider or maybe even someone else on the same host.
MaxIT avatar
tf flag
well it is dedicated server, but not sure how there infrastructure is deployed, maybe storage is shared between servers
Score:0
pl flag

One option to diagnose this would be to dstat left running in a terminal rather than htop. It's good at spotting the top processes over a period of time.

ssh to the server, and in screen, tmux or byobu, launch dstat with all these options: dstat --time --cpu --net --disk --sys --load --proc --top-cpu --top-mem --top-io and leave it running.

Here's what it looks like:

screenshot of dstat

When the issue starts occurring you can scroll back to the time when it started and look for anomalous processes. Especially in the "most expensive" last three columns for cpu, memory and io. It also lets you see whether the load average (middle columns) is freaking out immediately, or ramping up over time. May give some clues.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.