How do we diagnose delays After large number of files are deleted?

TonyG

1/10/23, 7:28 PM

A runaway program generated a large number (at least a million?) of files into /var/log. Even after deleting all (?) rogue files, any query on the folder/tree now takes about 5 minutes, and can cause the whole system to be sluggish.

The issue was that .gz files created by logrotate were being added to other .gz archives, and those were being archived and ... oops. So all invalid .gz files have been deleted from /var/log - the source issue has been fixed.

How can I find out exactly what is still causing delays?

In the /var/log tree there are 75 directories with 1606 files, consuming only 1GB.
ls /var/log takes over 5 minutes to process.
Other larger folder trees take much less time to query with ls, find, grep, etc.
tree on /var/log also takes about 5 minutes and the result is a completely normal folder/file tree.
df -i shows a total of 10 million inodes, less than a million used, over 9 million unused. The system has been rebooted several times.

I would be OK with rm -rf on the entire log tree followed by a restart. With mv or cp to a different folder, "some reset", and moving everything back, I would be concerned that I would just be copying the problem from one place to another.

I'm wondering if we can scan/cleanse for corrupted inodes, or maybe if it would help to reduce the number of inodes down to a minimum and then kick it back up after a restart.

It's a simple installation with /var in the one and only / root partition for OS/data. So unmounting/replacement is not an option.

I can easily run diagnostics and provide relevant info.

This is a fully patched v20.04.3 cloud server. I can open a console if required.

e4defrag did not show fragmentation. May run fsck (e2fsck or shutdown -rF) if that is advised. These are examples of the kinds of utilities I'm seeking to help with diagnostics for this kind of issue.

0 + 0

filesystem

performance

logs

inode

sudodus

1/10/23, 7:58 PM

Is it OK to reboot the computer?

matigo

1/10/23, 11:51 PM

Which version of Ubuntu and which file system are you using? Deleting and recreating the log directory is an option, but will need to be done via a live session as some directories cannot be deleted while the OS is running.

TonyG

1/13/23, 7:00 PM

Added OS version = fully patched 20.04.3. Deleting and recreating the log directory is an option. But as a technician I'm more interested in understanding why this is happening before nuking away the symptoms. :) If we understand the problem, and how to diagnose the problem, then others who see this will be better educated and better able to deal with similar issues. Note from above, system has been restarted many times.

TonyG

7/11/23, 9:58 PM

I haven't resolved this issue and now have two systems in the same state. Adding more detail to OP.

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: How do we diagnose delays After large number of files are deleted?

TH: เราจะวินิจฉัยความล่าช้าได้อย่างไรหลังจากไฟล์จำนวนมากถูกลบ?

RO: Cum diagnosticăm întârzierile după ce un număr mare de fișiere sunt șterse?

RU: Как мы диагностируем задержки после удаления большого количества файлов?

VI: Làm thế nào để chúng tôi chẩn đoán sự chậm trễ Sau khi một số lượng lớn các tập tin bị xóa?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.