I'm trying to do a shell script that would copy a bunch of files but struggle with setting up a loop to read through the files:

aws s3 cp s3://noaa-bdp-pds/gdas.YYYYMMDD/00/atmos/ s3://s3internal/raw/HDAS/

Here YYYY, MM, DD are numbers that I need to loop through.

I need to loop through all years, all months, and then dates to save all files. Let me know if this can be done?


Read `man seq`.
add all available combinations of YYYYMMDD to an array and use a for-loop to run the copy command for each iteration?
You can achieve this using aws s3 sync with wildcard and --dryrun which produces the output:

$ aws s3 sync s3://noaa-bdp-pds . \
  --exclude "*" --include "gdas.*/00/atmos/"

(dryrun) download s3://noaa-bdp-pds/gdas.20210001/00/atmos/ to noaa-bdp-pds /gdas.20210001/00/atmos/

Remember to use an empty directory, or it may interfere with the output.

Now, you can use this to construct a loop:

for line in $( \
    aws s3 sync s3://noaa-bdp-pds . \
    --exclude "*" --include "gdas.*/00/atmos/" | awk '/s3:\/\//{print $3}' --dryrun \
); do
    [[ $line =~ .*/gdas.(.*)/00/.* ]] && \
    echo aws s3 cp ${BASH_REMATCH[0]} s3://s3internal/raw/HDAS/hdas.${BASH_REMATCH[1]}

When you're satisfied with the result, remove echo to copy the files.

related: [AWS S3 ls wildcard support](

