Context: I am successfully using rsync to create daily incremental and remote backups using the --link-dest "referenceBackup" flag, so that when creating the new backup, only a new reference is created to an existing and unmodified file.
problem: Some directories contain lot off small files (hundreds of thousands), and even the optimization of the --link-dest is not ... "optimal". Most of those "small files" can be source files, small "*.o" compiled outputs, ... an creating a new enode reference for each of the many hundreds thousands of files consume file-system resources. For example, for a reference directory of 20 Gigabytes with 250.000+files the incremental backup uses about 100Mb (0.05%) of extra space (even when only 40 kilobytes of data is changed).
Question: I know for sure that full directories and directories subtrees will always contain mostly the same "hundreds of thousands" of small files (maybe with just 10 o 20 modifications) and I am just wondering if there is a better optimization backup strategy in such scenario (when compared to --link-dest ...). That's is, I would like to create just a "diff" of the existing directory in new incremental backup trying to even create a new reference to the enode for the 99% of existing files. Is there some sort of file-system overlay (docker-like) application or pattern?