Score:0

How to diagnose out of space errors?

ch flag

Twice I've noticed out of space errors in my php app and both times I've received the error "No space left on device while writing config" when attempting to login via ssh to troubleshoot the problem.

I have plenty of free disk space and both times my app worked again after restarting my server manually through my hosting company's website. Obviously, this is an inconvenience but when I have customers, it'll be completely unacceptable.

I've checked /var/log/messages for anything that may help diagnose the problem. All I could find that seems relevant is:

rsyslogd[1032]: imjournal: fopen() failed for path: '/var/lib/rsyslog/imjournal.state.tmp': No space left on device [v8.2102.0-5.el8 try https://www.rsyslog.com/e/2013 ]

I do have this as a cron-job running twice a day: find /tmp -atime +1 -delete. I don't think this is causing the issue but I am not certain. Is this even a good way to clear /tmp?

I guess as a quick fix, I could tell php to call a bash script to restart the server every time it encounters an out of space error. This doesn't feel like a good idea though without understanding exactly what's malfunctioned and why.

I am using AlmaLinux 8.5 (very similar to Centos) with Nginx and php-fpm and I only have a VPS. I'll edit my question if you think there's any relevant information I should include.

Edit

There's no point showing the results of any commands, until I encounter the error again. I've created a webpage to execute the commands using shell_exec and to display the results on screen. At the time of the error, I should be able to run the commands because nothing is written to disk:

enter image description here

NB

A client SSL certificate and login details that only I have, is required to run php as root and to access this page so I'm not worried about the security implications of running php as root or calling shell_exec with user data.

@NikitaKipriyanov suggested trying to keep my SSH connection open. If I didn't already have things setup this way (php as root for admin stuff), then of course it'd make more sense to try disabling SSH from timing out instead.

I will provide an update when I encounter this error again and have some results from my tests. Feel free to put the commands you think I should be executing into an answer, as I may upvote, and I'll accept the answer if it leads to me fixing the problem.

Edit - potential progress

Considering, I am sure that my system does not actually run out of disk space, I was expecting the problem to be a process still using a deleted file, or an error relating to a process that has crashed. There are plenty of articles stating this can cause out of disk space errors, but nothing about diagnosing this specific cause.

However, I've noticed my inode count has increased from 3% to 7% overnight. I do intentionally store data in lots of small files, however that should only account for my inode count increasing by a handful of inodes. I have crontab automatically create and store backups. I do monitor this so I would notice any anomalies.

I think the problem is my php $_SESSION array is creating way too many temporary files. The size of the $_SESSION array can increase linearly, so at present the amount of data stored is always insignificant in size. I don't create any backups of the $_SESSION array, so currently, I won't notice this increasing. This is very easy for me to test and observe this so that will be the next step. I don't want to make any assumptions so I'm going to wait and see if the inode count approaches 100% causing a crash before I attempt to fix the problem.

NB

As soon as I've confirmed the problem, I will move this extra information into an answer.

Nikita Kipriyanov avatar
za flag
/var/lib/rsyslog/imjournal.state.tmp is not the same thing as /tmp I guess, so your (unsafe!) cron job has nothing to do with the problem. Where is df -h, df -i, analysis with du and so on?
Romeo Ninov avatar
in flag
What is the result of `df -k /var/lib/rsyslog/`?
Dan Bray avatar
ch flag
@NikitaKipriyanov I haven't included them because I have lots of free space. At the time of the error, running `df` is not possible because I can't login via `ssh`. After a server restart, those commands simply show lots of free space.
Dan Bray avatar
ch flag
@RomeoNinov I've edited my question to include the result
Dan Bray avatar
ch flag
@NikitaKipriyanov what would you suggest to make my cronjob safe? Even if it's completely unrelated to the problem, I'd much rather it be safe.
Matthew Ife avatar
jo flag
What is the output of `ipcs`?
Dan Bray avatar
ch flag
@MatthewIfe `------ Message Queues -------- key msqid owner perms used-bytes messages ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status ------ Semaphore Arrays -------- key semid owner perms nsem`
Nikita Kipriyanov avatar
za flag
Please, ask dedicated question about that job. Don't ask multiple questions in one. Regarding "can't ssh", then you might want to, for example, have already running SSH connection to catch the moment when this happens. Or set up monitoring, but this way it wiill take longer to diagnose. Either way, you need to have a diagnostic information gathered when this happens or at least, when it is about to happen, you need to know what is going on, not guess.
Matthew Ife avatar
jo flag
You may want to check inode count too but its typically not usually that. What is the output of `df -i`?
Dan Bray avatar
ch flag
@NikitaKipriyanov the out of space errors in php and when attempting to login to ssh are the same problem though. Setting up monitoring sounds like a good idea. Any ideas how I should proceed with that? I could edit my settings so that my `ssh` connection never times out, but I'm not certain it would stay open after receiving the error. I could certainly prepare some bash files attempt to execute them from `php` the next time I receive the error.
Nikita Kipriyanov avatar
za flag
Install and configure Zabbix, Nagios, what I am talking about, plenty of various options. At least prepare df -h (space, human readable), df -i (inodes), and various others. If you will be able to have that at the moment of an error, that will be topic to diagnose further. Then you'll prepare new scripts, again and again. This is why I say "will take longer". On the other hand, if you happen to have a working shell, you'll be able to diagnose this in one shot. It's worth trying.
Dan Bray avatar
ch flag
@NikitaKipriyanov I am going to create a page to run the ssh commands from `php`. I have php running as root when a client-side SSL certificate is provided so I should be able to run those commands at the time of error and echo to the screen. I'll have a look at `Zabbix` and `Nagios`. I would expect anything that requires writing to disk at the time of error to fail though. Would have to send live data to the screen.
Nikita Kipriyanov avatar
za flag
I see this is going nowhere. Just connect beforehand and hope it will survive (it should, disk overflow doesn't break running ssh). Really. Don't assume, do things.
Dan Bray avatar
ch flag
@NikitaKipriyanov I believe you. I only said I wasn't certain. Anyway, I am going to create a diagnostic page in php that can run the commands, simply because it's something I should have anyway (my app has already evolved into a cPanel). `I see this is going nowhere` - You've given me some good ideas that I can try so this is going somewhere. It might take a while though because I have no idea how long it will be before I encounter the error again. You could even put what you suggested as an answer, although I can't accept it until I've successfully diagnosed the problem.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.