A runaway program generated a large number (at least a million?) of files into /var/log. Even after deleting all (?) rogue files, any query on the folder/tree now takes about 5 minutes, and can cause the whole system to be sluggish.
The issue was that .gz files created by logrotate were being added to other .gz archives, and those were being archived and ... oops. So all invalid .gz files have been deleted from /var/log - the source issue has been fixed.
How can I find out exactly what is still causing delays?
- In the /var/log tree there are 75 directories with 1606 files, consuming only 1GB.
ls /var/log
takes over 5 minutes to process.
- Other larger folder trees take much less time to query with
ls
, find
, grep
, etc.
tree
on /var/log also takes about 5 minutes and the result is a completely normal folder/file tree.
df -i
shows a total of 10 million inodes, less than a million used, over 9 million unused. The system has been rebooted several times.
I would be OK with rm -rf
on the entire log tree followed by a restart. With mv
or cp
to a different folder, "some reset", and moving everything back, I would be concerned that I would just be copying the problem from one place to another.
I'm wondering if we can scan/cleanse for corrupted inodes, or maybe if it would help to reduce the number of inodes down to a minimum and then kick it back up after a restart.
It's a simple installation with /var in the one and only / root partition for OS/data. So unmounting/replacement is not an option.
I can easily run diagnostics and provide relevant info.
This is a fully patched v20.04.3 cloud server. I can open a console if required.
e4defrag
did not show fragmentation. May run fsck
(e2fsck
or shutdown -rF
) if that is advised. These are examples of the kinds of utilities I'm seeking to help with diagnostics for this kind of issue.