Score:1

awk $1 and $2 variables coming up empty

ph flag

GOAL : compare the size of our directory structure day over day. The data folder has over 990tb in it, so I had to run a bunch of parallel du's in order for it to finish in a reasonable amount of time. Occassionally we see large data growth very quickly, and currently don't have a good way of seeing where the data was added.

PROBLEM : The $1 and $2 of my awk aren't output anything, and the single quotes that should surround them also aren't showing up.

PRE-EMPTIVE STRIKE : I know there are WAY better tools to find this information, and we are working on implementing them. This is meant to be a quick and dirty band-aid to allow us to address any rapid growth that happens until the proper monitoring software is in place. Also, the delimiter between the two values in the log file is a tab. Pasting it into this WYSIWYG editor converted the tab to spaces.

Thanks in advance for any help you can provide me!

Snarf


I am attempting to do the following (psuedo code)

  • Find folders two levels deep in our data folder
  • Create a directory structure in a temp folder that mirrors the data folder structure
  • Create log files in the temp folder structure for each of the folders found in the "find folders"
  • Fill those log files with the output of a du -s of the folders from the "find folders"
  • awk the log files and build sql inserts
  • once the sql inserts look right, I will pipe the awk to mysql
  • once the data exists in mysql, it'll be easy to query day over day stats

Script -

DT=`date +"%Y%m%d"`
BASE=/mnt/data/test/

find /mnt/data -maxdepth 2 -mindepth 2 -type d -exec sh -c 'mkdir -p "$(dirname '"$BASE$DT"'{}.log)";touch '"$BASE$DT"'{}.log; du -S {} > '"$BASE$DT"'{}.log; awk -F'\''\t'\'' '\''{print "INSERT INTO DATE'"$DT"'(folder_size, folder_location) VALUES('\''$1'\'', '\''$2'\'');"}'\'' '"$BASE$DT"'{}.log' \;

Example of log file -

0       /mnt/data/apps/bog/minio.production-config/.minio/certs/CAs
12      /mnt/data/apps/bog/minio.production-config/.minio/certs
1       /mnt/data/apps/bog/minio.production-config/.minio
1       /mnt/data/apps/bog/minio.production-config

Example of output from the script for this log file -

INSERT INTO DATE20220508(folder_size, folder_location) VALUES(, );
INSERT INTO DATE20220508(folder_size, folder_location) VALUES(, );
INSERT INTO DATE20220508(folder_size, folder_location) VALUES(, );
INSERT INTO DATE20220508(folder_size, folder_location) VALUES(, );
in flag
Is it really necessary to count the file sizes? Is it not sufficient to just monitor the disk usage?
Score:1
bd flag

You are deep in quoting hell there.

Your find command passes to sh -c an argument like

mkdir -p "$(dirname /mnt/data/test/20220508/mnt/data/abc/def.log)";touch /mnt/data/test/20220508/mnt/data/abc/def.log; du -S /mnt/data/abc/def > /mnt/data/test/20220508/mnt/data/abc/def.log; awk -F'\t' '{print "INSERT INTO DATE20220508(folder_size, folder_location) VALUES('$1', '$2');"}' /mnt/data/test/20220508/mnt/data/abc/def.log

Now sh parses this, removing one quoting level. It expands the $(dirname /mnt/data/test/20220508/mnt/data/abc/def.log) to /mnt/data/test/20220508/mnt/data/abc, and the variables $1 and $2 in the awk to empty strings (since it didn't receive any positional parameters), giving

mkdir -p /mnt/data/test/20220508/mnt/data/abc;touch /mnt/data/test/20220508/mnt/data/abc/def.log; du -S /mnt/data/abc/def > /mnt/data/test/20220508/mnt/data/abc/def.log; awk -F\t '{print "INSERT INTO DATE20220508(folder_size, folder_location) VALUES(, );"}' /mnt/data/test/20220508/mnt/data/abc/def.log

(I reinserted the single quotes around the awk program text argument for clarity.)

The easiest way out is to create a file for the awk program which you then pass to awk via the -f option. I would then also recommend doing the assignment FS = "\t" in that file instead of using the -F option.

Finally, if you don't have any further use for the log files except for creating the SQL statements, you can simplify your script considerably by piping the output of du directly to awk, like:

DT=`date +"%Y%m%d"`

find /mnt/data -maxdepth 2 -mindepth 2 -type d -exec sh -c 'du -S {} | awk -v DT='$DT' -f /mnt/data/makeinserts.awk' \;

with the file /mnt/data/makeinserts.awk containing the pure awk program:

BEGIN{FS="\t"}
{print "INSERT INTO DATE"DT"(folder_size, folder_location) VALUES('"$1"', '"$2"');"}
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.