Score:1

Delete folders asynchronously

id flag

I have a large file system in which I have to delete certain directories from time to time. Currently I have a script which amongst other things, deletes a folder and subsequently generates an email notification. However, as the deletion of a directory can take anything from a few seconds to a few days, I would like to do this asychronously.

I can cook up a solution by say, generating little snippets like rm -rf /some/directory in the appropriate cron directory, but that might get clogged if a large number of large directories need to be deleted.

Is anyone aware of a better solution?

Score:0
ws flag

Deleting a folder should be nearly instantaneous. It is searching the directory tree and deleting multiple files and directories which is likely the issue.

that might get clogged

I don't know what you mean by this.

If you worry that execution of a single instance may overlap with the subsequent execution, then why is that an issue? If there is a valid for reason for ensuring exclusivity of instances, then use a lock file or limit the run time with timeout.

id flag
Yes, I am deleting large directory trees. By clogging I mean that if deletions take longer than the ```cron``` interval, the number of deletion processes running could increase in an uncontrolled manner. I'd probably want a mechanism to limit that.
Score:0
ca flag

What is slowing down your deletion is not the file removal by itself (as such operations are batched in the journal and committed to the main filesystem in large chunks, so they already are async in a sense), rather the sync reads needed to discover what to delete. In other words, is the metadata traversal needed to list all the inodes to be deleted that commands the biggest hit - by far. There is no real escaping from that, unfortunately.

Some things you can do:

  • use a fast cache device to cache as many metadata as possible
  • use disposable volumes/filesystem, where "delete many files" becomes "simply discard the entire volume or filesystem"
  • schedule partial, progressive deletion via cron or similar tools

For more info about delete performance and other things which slow down file removal, you can read this answer.

id flag
The directories I want to delete are actually the ```home``` and ```scratch``` (on GPFS and Lustre, respectively) directories of former users of an HPC system. I don't have much latitude to tweak the basic configuration, but I am happy to just deal with the problem at the level of directories. I don't really care that deletion will take a long time, I just don't want it to delay the script which performs the other housekeeping activities associated with removing a user. I guess I'll just generate some sort of list of directories which can then be removed by a ```cron``` job.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.