Score:2

Processing files in a directory with variables

cm flag

This script will be run on Ubuntu 22.04.1 LTS

I am new to Ubuntu and scripting, but I have written code and o/s scripts in other o/s (mostly VMS) and C (many years ago). I know Linux often puts multiple commands on a single line, but I want to try and keep the code so it is easier to ready later. So single commands per line are preferred.

I am trying to loop through all files in a directory that are between two file names and I want the code to be flexible so I can modify it over time. As an example, I want to be able to process all filenames that first letter is between D* and J*. The file names do include spaces and other special characters.

I want to pass in a source root directory and destination root directory as variables and access those variables within side the loop. I also want to count the number of files that were processed successfully and those that failed so I can display it once at the end of the execution. I can see the counts increase inside the loop, but the values don't exist outside the loop.

I have a pretty good start. I can: * loop through and find the files (but not limited by start/end letters) * calculate how long the command execute which I plan to add that to log files later * count the number of successes and failures, but can't display after the loop.

I have three problems:

  1. The variables SouceDirectoryRoot and DesinationDirectoryRoot are not accessible inside the loop from the find command. I want to use them inside the loop so I can create sud-directories as needed in the DestinationDirectory. I don't want to set them twice, once inside the loop and once outside the loop. My long-term goal is to have those passed as parameters to the script, vs hard-coded they way it is now.

  2. Similar to problem 1 the values of cntSuccess and cntFail are not available after the loop from the find command. I can see they are incremented properly inside the loop, but don't exist after the loop. I want to have a single output at the end that shows the number of success and failure. I have the output in place now, but the values are zero.

  3. I can't figure out how to limit the files so they are between the variable StartFile and EndFile names. The directory tree (including sub directories) has hundreds of files and the conversion command (not included here) can take +30 minutes per file. So I want to run multiple versions of this script (or later convert it to passed parameters vs hard coded values) at the same time each processing a different subset of files.

SourceDirectoryRoot=/mnt/media_bulk/movies
DesitationDirectoryRoot=/mnt/media_bulkd/movies-H265
StartFile=D*
EndFile=J*
cntSuccess=0
cntFail=0

find $SourceDirectoryRoot -type f -exec sh -c '
    for FileSpec do

    echo ""
    echo "File spec: $FileSpec"
    FileName=${FileSpec##*/}
    #  echo "File name: $FileName"
    echo "Source $SourceDirectoryRoot"
    StartTime=$(date +%F" "%T)
    echo "Start time:  $StartTime"
    StartSeconds=$(date -d "${StartTime}" +%s)

    #command to time duration goes here
    #

    # save the status of the command so it can be used later
    status=$?
    if [ $status -eq 0 ]
    then
      # command was successful
      echo "The command was succesful"
    else
      # the command had an error
      echo "The command failed"
    fi

    EndTime=$(date +%F" "%T)
    echo "End time:  $EndTime"
    EndSeconds=$(date -d "${EndTime}" +%s)

    DurationSeconds="$(($EndSeconds-$StartSeconds))"
    Duration=$(date -d @${DurationSeconds} +"%H:%M:%S" -u)
    echo "Duration: $Duration"

    if [ $status -eq 0 ]
    then
      # command was successful
      echo "The command was successful and executed for $Duration"
      cntSuccess=$(($cntSuccess+1))
      echo "cntSuccess = $cntSuccess"
    else
    
      # the command had an error
      echo "The command failed after $Duration"
      ((++cntFail))
    fi
done' sh {} + #end for loop

echo "$cntSuccess files successfully processed"
echo "$cntFail file failed to process"

Subset of output (I bolded where I am having issues caused by variables not being accessible):

File spec: /mnt/media_bulk/movies/Marvel/Captain America 2 (9).m4v
Source 
Start time:  2022-12-27 14:33:22
The command was successful
End time:  2022-12-27 14:33:22
Duration: 00:00:00
The command was successful and executed for 00:00:00
cntSuccess = 275

File spec: /mnt/media_bulk/movies/Marvel/The Avengers 2 (11).m4v
Source 
Start time:  2022-12-27 14:33:22
The command was successful
End time:  2022-12-27 14:33:22
Duration: 00:00:00
The command was successful and executed for 00:00:00
cntSuccess = 276
0 files successfully processed
0 file failed to process
SkiBum avatar
cm flag
I am not sure. I used an answer from another post about processing file lists to build that for loop. The source of my problem maybe the way I implemented the loop. (also - thank you for fixing the formatting of my initial post)
Artur Meinild avatar
vn flag
Yeah - I think you should probably rethink the way you have structured the script. At the moment, you start the `for` loop in a subshell, which is why the variables aren't carried over. This shouldn't be necessary. In addition, this hurts readability, since highlighting is gone (because the entire for loop is in a string). This isn't a good practice.
hr flag
You can pass variables *into* the child shell by *exporting* them - unfortunately afaik there's no simple solution to passing values back to the parent
SkiBum avatar
cm flag
What is a preferred way to execute multiple commands from the results of a find command (where the file names include spaces and parentheses)?
Score:4
hr flag

The paradigm find -exec sh -c '...' {} + is in general a good way to process files - although for more than a few lines, I'd consider moving the processing loop to a separate shell script and execute as find -exec /path/to/script {} +.

You can limit the range of files using a -name glob pattern or -regex regular expression - for example, -name '[D-J]*' to match only files whose names sort in the lexical range D to J in your locale.

You can pass variable values into the child shell process by exporting them from the parent environment (export SourceDirectoryRoot). Unfortunately there is not (afaik) an equivalent mechanism to pass values up to the parent. You could consider writing them to a status or log file and reading them back after (which also provides some permanence if the job gets killed or interrupted).

Alternatively you could re-factor your code to do all the processing in the parent bash shell, passing the filename list in as a null-delimited list using process substitution:

#!/bin/bash

SourceDirectoryRoot=/mnt/media_bulk/movies

cntSuccess=0
while IFS= read -r -d '' file; do 
    printf 'processing file: %s\n' "$file"
    ((cntSuccess++))
done < <(find "$SourceDirectoryRoot" -type f -name '[D-J]*' -print0)

printf '%d files successfully processed\n' "$cntSuccess"

You might also want to look at using GNU parallel to possibly process the files more efficiently.

I hope this will give you some ideas.

SkiBum avatar
cm flag
Thank you - this gives me some good leads to work with! I tried replacing -name '[D-J]* with my variable (StartFile=A EndFile=B) resulting in -name '[$StartFile-$EndFile] and and get an error. If I use the letters it does work. Should I use a different syntax?
hr flag
@SkiBum you would need to use double quotes rather than single quotes `"[$StartFile-$EndFile]*"` to allow the shell to expand variables (while still preventing premature filename generation)
SkiBum avatar
cm flag
That worked (once I fixed my comment on the next line)!! Thank you again
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.