Score:0

Linux | Copy only top 100 new files in directory and nested directories

us flag

I have storage something like below on AzureVM/Ubuntu

-/A
   -/B --> 10000 log files
   -/C --> 100000 log files 
      -/D --> 200000 images 
   summary.xml
   -/data --> 1000 csv files

Now because data size is very big to compute and do any operation there I want to take a sample of this data to develop my data analysis code.

I want to copy a subset to a different location which has the 100 newest files in each directory and nested directory and all the files on the root something like this.

-/New_Location
   -/B --> 100 log files
   -/C --> 100 log files 
       -/D --> 100 images 
   summary.xml
   -/data --> 100 csv files

I tried multiple commands based on cp but nothing is working for me and taking too much time to execute.

Can someone please help me here?

David avatar
cn flag
What commands? Lets see exactly what you have tried. How can anyone help you if you do not give the info.
Bamboocoder avatar
us flag
one of things i tried, going to each folder and then run this command cp -R -- *([1,100]) ../New, but it is not copying all nested folders data, also too much manual work as I have thousands of folders within one
David avatar
cn flag
Additional info in the question please not as a comment. You said one of the things what were the rest?
Score:0
cn flag

This can easily be done by selective archiving. You can tarball the files (only the intended ones) and then extract the tarball somewhere else. I am assuming that your log files have the same name except for the numbering (e.g. log1, log2 etc). So the first hundred files can be defined in tarball command as log{1..100}. For example:

tar -cvf copied.tar <path1>/log{1..100} <path2>/log({1..100} etc

When you extract, the original file structure will be recreated in the new location. So you may need to use "--strip-components=" option to truncate the redundant leading directories to avoid clutter.

Score:0
cn flag

You can usually divide this into three tasks, where you start with the directory structure and next, as in your case, limit files to 100. The last part inverts the match to scope up the rest of the files.

#!/bin/bash  
  
# Example START  
[[ ! -d A/ ]] && { \  
mkdir -p \  
A/{tmp/folder,\  
{A..Z}}/{images,data} && \  
printf %s\\0 \  
A/{summary.xml,\  
tmp/De5Loh4X.tmp,\  
{A..Z}/{{1..1000}_file.log,\  
images/{1..1000}_pic.{jpg,png},\  
data/example.csv}} | xargs -0 touch; }  
### Example END  
  
set -o noglob  
  
source=A  
target=target  
number=100  
# prune="-false"  
prune="-type d -path $source/tmp -prune"  
match='-name *.log -o -name *.jpg -o -name *.png'  
  
echo Create directory structure.  
find "$source" \  
\( $prune -o -type d -links 2 \) -printf %P\\0 | cpio -0 -pvdm -D "$source" "$target"  
  
  
echo Copy 100 files.  
while IFS= read -rd ''; do  
find "$REPLY" \  
-maxdepth 1 -type f \( $match \) -printf '%T@\t%P\0' | sort -zk1rn | cut -zf2- | head -zn $number | cpio -0 -pvdm -D "$REPLY" "$target/${REPLY/#$source\//}"  
done < <( \  
find "$source" \  
\( $prune -false -o -type f \) -printf %h\\0 | sort -zu \  
)  
  
echo Copy everything else.  
find "$source" \  
\( $prune -false -o -type f ! \( $match \) \) -printf %P\\0 | cpio -0 -pvdm -D "$source" "$target"
bac0n avatar
cn flag
`-name *` will limit all file types to 100
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.