Score:0

rsync optimization over --link-dest "referenceBackup"?

cn flag

Context: I am successfully using rsync to create daily incremental and remote backups using the --link-dest "referenceBackup" flag, so that when creating the new backup, only a new reference is created to an existing and unmodified file.

problem: Some directories contain lot off small files (hundreds of thousands), and even the optimization of the --link-dest is not ... "optimal". Most of those "small files" can be source files, small "*.o" compiled outputs, ... an creating a new enode reference for each of the many hundreds thousands of files consume file-system resources. For example, for a reference directory of 20 Gigabytes with 250.000+files the incremental backup uses about 100Mb (0.05%) of extra space (even when only 40 kilobytes of data is changed).

Question: I know for sure that full directories and directories subtrees will always contain mostly the same "hundreds of thousands" of small files (maybe with just 10 o 20 modifications) and I am just wondering if there is a better optimization backup strategy in such scenario (when compared to --link-dest ...). That's is, I would like to create just a "diff" of the existing directory in new incremental backup trying to even create a new reference to the enode for the 99% of existing files. Is there some sort of file-system overlay (docker-like) application or pattern?

Score:1
ca flag

The added overhead with hardlink-based incremental backup is generally due to all the new directories each backup run need to create.

Directories can not be hardlinked on Linux (or other Unixes) and need to be create as usual. They consume spaces as any other files, so you end with a small overhead.

On the other hand an hardlink is only another name for the same inode, with almost no overhead at all.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.