Score:2

rsync all pdfs except in certain directories?

cn flag

I'm trying hard to understand the rsync filter system, and it's completely baffling me.

I have the following "test" directory structure to try to make sense of it. With no filter options here are all my files:

rsync -amv --dry-run /source /target

building file list ... done
source/
source/1.pdf
source/2.pdf
source/exclude_rules.txt
source/filter_rules.txt
source/excludedir/
source/excludedir/2.jpg
source/excludedir/4.pdf
source/subdir/
source/subdir/1.jpg
source/subdir/1.txt
source/subdir/3.pdf
source/subdir/subdir2/
source/subdir/subdir2/6.jpg
source/subdir/subdir2/6.pdf

I just want to sync all *.pdf files except in certain directories, namely any directory that has *exclude* in it.

I'm using a file with the filter rules in it with the following command:

rsync -amv --dry-run --filter='merge /filter_rules' /source /target

The filter_rules look like variations on the following but I can't get them to produce the results I'm after:

-/ *exclude*/
+/ *.pdf
-/ *

The closest I've come is with the simple exclude:

-/ *exclude*/

Which yields:

building file list ... done
source/
source/1.pdf
source/2.pdf
source/exclude_rules.txt
source/filter_rules.txt
source/subdir/
source/subdir/1.jpg
source/subdir/1.txt
source/subdir/3.pdf
source/subdir/subdir2/
source/subdir/subdir2/6.jpg
source/subdir/subdir2/6.pdf

How do I filter the rest to just get *.pdf ?

Score:1
cn flag

For posterity, I did finally get this to work, and here are the instructions I wish I had had:

  • rsync starts the filter process with a full list of files
  • the filter rules are handled IN ORDER (took me a while to get this)
  • You may have all the right rules, but not the right order, so if you're using external exclude or include files, they may need to be re-ordered with a filter file which allows you to mix and match include/exclude rules, or listed on the cli itself
  • for each file, The FIRST FILTER RULE THAT MATCHES puts the file into one of 2 buckets, include or exclude.
  • Rules after the first matching rule are not applied!
  • Each rule acts only on the files that made it "past" the previous rules not being matched
  • Files that don't match any rules are INCLUDED
  • The last rule is the most important and unintuitive, and it means exclude everything that wasn't specifically included UP TO THAT POINT.

So here's what ended up working:

-/ *exclude*/
+/ */
+/ *.pdf
-/ *

Originally I had those rules in separate include-from and exclude-from files, and that wouldn't allow for the proper order.

TomOnTime avatar
pt flag
Props for returning and posting what you learned and how you learned it!
Score:0
jp flag

I still use --exclude-from in my rsync, but this link was remarkably helpful when I attempted to get filtering to work.

https://stackoverflow.com/questions/35364075/using-rsync-filter-to-include-exclude-files

edit - the OP nailed this in his own answer, but as requested... helpful bit from that link

Explanations:

(only rewording the manual in the end but as you said the manual is a bit >cryptic)

Rules are read from top to bottom each time a file must be transferred by >rsync. But in your case /mnt/data/i-want-to-rsyncthisdirectory/ is not >backed up because you exclude /mnt and this short-circuits your include >rules. So the solution is to include each folder and subfolder until the >folder you want to back up and then to exclude what you do not want to back >up subfolder by subfolder.

Note the * at the end of each subfolder exclusion. It will prevent rsync to >back up the files and folder located in these subfolders which is what you >want I think. Simpler solution: (edit 2)

You can even simplify this with the *** pattern that was added in version >2.6.7:

  • /mnt/
  • /mnt/data/
  • /mnt/data/i-want-to-rsyncthisdirectory/***
  • /mnt/**

This operator allows you to use the ** wildcard for exclusion and >consequently to have only one exclude line.

I also discovered that you can understand which filter rules exclude/include >each file or folder thanks to the following rsync arguments:

--verbose --verbose

Combined with the --dry-run argument you should be able to debug you problem >:)

cn flag
Thanks for answering! Can you possibly pull out the bits that were most helpful, in case the question or answers get deleted in the future?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.