Score:1

Performant sort files into subdirectories by content of metafile

mo flag

-edit: more details, correct code-

I want to move files from a directory into sub directories by data out of a metadata file.

There are groups of files like <name>.<extension>. Each group consists of 3 files. One of the files in each group has the extension .idx. This is the metadata file, a text file.

In the metadata file, there is exact one line like VALUE_01=XXXXX and some others (15 to 60 lines, <key>=<value>, <key> is unique).

Now I want to move all files <name>.* from the current directory to a sub directory named XXXXX in this case (the value to the key VALUE_01).

I played around with for loops, etc., but even a ls *.idx doesn't work because there are round about 2 million files there! So it doen't work and I need some performance.

I tried

find . -maxdepth 1 -type f -name "*.idx" -exec grep -H "VALUE_01=" {} ";" | perl -pe 's/(.*?).idx:VALUE_01=(.*)$/\1.* .\/\2\//'

So I get a list like

./file1.* ./XXXXX/
./file2.* ./XXXXX/
./file3.* ./YYYYY/

I tried to pass this as arguments via xargs to mv.

... | xargs mv

to get

mv ./<name>.* ./XXXXX/

but I get error messages

mv: cannot stat './XXXXX/': No such file or directory
mv: cannot stat './file1.': No such file or directory
mv: cannot stat './XXXXX/': No such file or directory
mv: cannot stat './file2.
': No such file or directory
mv: warning: source directory './YYYYY/' specified more than once
mv: cannot stat './file3.*': No such file or directory

I think, it's an incorrect use of xargs.

I'm not good in shell programming, so I don't get the hint how to use it, or how to avoid it.

Andy A. avatar
mo flag
Each `.idx` file haves exact one line `VALUE_01=xxxx` (or `VALUE_01=yyyy`) and some other lines. The `idx` file must be copied, too. - I will edit my questure...
terdon avatar
cn flag
Can you show us an example of the file? Is the `VALUE_01=XXXX` the only thing in the line? Is the `XXXX` always numbers? Always letters? Always non-whitespace?
Score:2
cn flag

The trick will be to use for file in *idx. You don't ever want to do for file in $(ls *idx) anyway, see Bash pitfalll #1, and in any case, the shell cannot handle expanding so many file names, as you already saw. However, using the builtin for gets around the problem, so you can try something like this:

for file in *.idx; do
  name="${file%%.idx}"
  num=$(grep -m 1 -oP 'VALUE_01=\K\S+' "$file")
  mkdir -p "$num"
  printf 'mv %s* %s/' "$name" "$num"
done > script.sh

Explanation

  • for file in *.idx; do ... done: iterate over all files and directories whose name ends in .idx, saving each as $file.
  • name="${file%%.idx}: the syntax ${var%%pattern} will return the value of $var with the longest match for the pattern pattern removed from the right hand side of the variable. See https://tldp.org/LDP/abs/html/string-manipulation.html. So this will return file if given file.idx.
  • num=$(grep -m 1 -oP 'VALUE_01=\K\S+' "$file"): get the name of the target directory from the idx file. The -m tells grep to stop searching after the first match since there is no reason to process the entire file. Next, the -o means "print only the matching part of the line" and the -P enables PCRE which give us \K for "ignore everything matched up to this point" and \S and + for "one or more non-whitespace characters". So this will look for the string VALUE_01= and then print the longest non-whitespace string it finds after it. Note that this assumes your XXXX has no whitespace.
  • mkdir -p "$num": create the target directory if it doesn't already exist. You want the -p because that makes mkdir simply do nothing if the directory does already exist.
  • printf 'mv %s* %s/' "$name" "$num": print out the commands that need to be run (e.g. mv foo* XXXX).
  • ... done > script.sh: capture all the commands printed by the previous step into a file called script.sh.

This will enable you to check the commands and try a few manually to see that they work. If they do, you can either just run sh script.sh to execute them, or redo the loop but this time execute the commands instead of printing them:

for file in *.idx; do
  name="${file%%.idx}"
  num=$(grep -m 1 -oP 'VALUE_01=\K\S+' "$file")
  mkdir -p "$num"
  mv "$name"* "$num"
done > script.sh
Andy A. avatar
mo flag
Thanks! Is it correct, that I have to use `mv "$name"*`? Otherwise I get `mv: cannot stat ...` exception.
terdon avatar
cn flag
@AndyA. yes indeed, sorry! I had it in the `printf` example but forgot to include it int he second one.
Score:2
hr flag

Assuming you want to move all the files matching file.* into a subdirectory retrieved from the VALUE_01 of the corresponding file.idx in the same subdirectory, then a structure like this might be what you are looking for (echos left in for testing):

find . -name '*.idx' -execdir sh -c '
  for idx do
    tgt=$(grep -m1 -Po "VALUE_01=\K.*" "$idx")
    [ -n "$tgt" ] && echo mkdir -p "$tgt" || continue
    printf "%s\0" "${idx%.idx}".* | xargs -r0 echo mv -nt "$tgt" --
  done
' sh {} +

Testing briefly with

==> ./dir/file.idx <==
VALUE_01=XXXXX

==> ./dir/foo.idx <==
VALUE_01=YYYYY

gives

$ find . -name '*.idx' -execdir sh -c '
  for idx do
    tgt=$(grep -m1 -Po "VALUE_01=\K.*" "$idx")
    [ -n "$tgt" ] && echo mkdir -p "$tgt" || continue
    printf "%s\0" "${idx%.idx}".* | xargs -r0 echo mv -nt "$tgt" --
  done
' sh {} +
mkdir -p XXXXX
mv -nt XXXXX -- ./file.idx ./file.json
mkdir -p YYYYY
mv -nt YYYYY -- ./foo.awk ./foo.idx ./foo.py

The -- to end the option list is not strictly necessary since -execdir prepends ./ but I like it as a visual cue. The -m1 in the grep just make it quit as soon as it finds a match, which may make a difference if the .idx files have a lot of following content. If you know the target directories already exist, then you can omit the mkdir. Alternatively you could test for existence.

Andy A. avatar
mo flag
Sorry for editing your code. I added `-maxdepth 1` option to prevent the script making a new subdirectory and move the files a level higher. 1. run moves files from `.` to `./XXXX/` next run moves file from `.` to `./XXXX/` and files out of `./XXXX/` to `./XXXX/XXXX/` and so on.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.