Score:0

CentOS - listing size of top-level directories in a 50TB NTFS volume is way too slow

us flag

What I want to achieve is get the information about how big are the top-level folders in a directory (which is a NTFS volume) on a CentOS7 server. This information is placed in a Prometheus file, which is used in order to send this information into a Grafana dashboard.

The script executed via a cron job every day looks like this:

#!/usr/bin/env bash

# Generate Prometheus collection metrics about Jenkins projects disk usage on the system.
# Currently, only top-level folder information is collected in bytes.

# Truncate last text file entry, so it can be prepared for entering the newest data
echo -n > /var/lib/node_exporter/textfile_collector/jenkins_projects_disk_size.prom

cd /jenkins/jobs  # Go into the NTFS directory, containing all jobs data
for f in *; do  # Go through each top-level folder
    if [ -d "$f" ]; then  
        # Will not run if no directories are available
        prometheus_entry=$(du ${f} --block-size=1 --summarize "$@" | \ # execute a `du` command for each top-level folder, so that it's size can be calculated, and save the output in the Prometheus format
          sed -ne 's/\\/\\\\/;s/"/\\"/g;s/^\([0-9]\+\)\t\(.*\)$/jenkins_directory_size_bytes{directory="\2"} \1/p') 
        echo $prometheus_entry >> /var/lib/node_exporter/textfile_collector/jenkins_projects_disk_size.prom 
    fi
done

This currently works a on a few servers, which do not have a massive directory size compared to the server with issues (500GB-1.5T), and also works relatively fast.

The current problem I am having is that on the problematic server in particular the size of the folder is quite big (50T). Of course, as it can be expected with that size, the du/df commands are very slow (I think I will need more than 15-20 hours to execute the script).

Is there a way to further optimize this process, or use some sort of cache or any other alternative way (e.g. with a different tool)? I already tried ncdu, but it is a GUI and I cannot extract the information the same way I want for Prometheus to work.

As previously mentioned, I only need top-level folder size information and nothing else. Any help or advice will be greatly appreciated! Thanks in advance.

cn flag
Bob
`df` is quick so placing that directory on its own file system/volume/disks is one solution. `du` and friends need to traverse the whole directory tree and the only way to make that faster AFAIK is by making the underlying storage faster
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.