Score:0

After rsync folders have different sizes

es flag

I have used rsync to copy a folder from one machine to another. The folder is very large and has thousands of subdirectories and in total millions of files. After the transfer (which was interrupted a couple of times) I check the folder on each machine. The results do not match:

Source Folder: 430 Gb

Target Folder: 415 Gb

I tried rsyncing again but all the subdirectories are skipped. I have also tried adding the --checksum to the command and it still skips everything. Is this difference of 15Gb explainable? If not is there some way to find which files do not match?

Update The rsync command I use is:

rsync user@source:folder target
codlord avatar
ru flag
I suggest you update your original question with the actual rsync command you are using.
cn flag
Do the 2 have the same filesystem? The same blocksize? How did you calculate the size? Did it include tmpfs?
MrHat avatar
es flag
I calculated the size with du -h. I checked and the filesystems are zfs for the source location and nfs for the target location. Unfotunately I can't find the block size for the block devices because they are on the network and I do not have access.
Artur Meinild avatar
vn flag
By default, `du -h` reports block size. To show the actual file size, you need `du -h --apparent-size`. I'm pretty sure the apparent size will be the same, in which case different block sizes is the reason for different space usage.
Score:1
cn flag

There are a couple of reasons why there may be, or one may see, differences in sizes reported.

  1. Reported sizes depend on the command you use, and what specifically is measured. The space used by the file, moreover, may depend on the file system and formatting options used to format the drive. Linux file systems also support sparse files. That means that parts of the files (e.g. binaries) that are empty are not physically stored on disk. Thus, the physical size of the file on disk (as reported by du) is smaller than its logical file size (as reported by du --apparent-size or ls).

  2. Another factor that may cause sizes of an rsync copy to differ is the presence of hard links. By default, hard links, i.e. different files pointing to the same information on disk, will be copied as separate files, thus take twice the space.

  3. Yet another possible cause of a size difference is that you repeated the backup. By default, rsync will preserve files on the copy that may have been deleted in the source. One must add the --delete option to also delete these files on the targer.

The two latter reasons would lead to a target folder larger than the source folder, so this will not be an explanation in your current case.

rsync is a very reliable tool, and its file comparison algorithms are robust. It compares files based on time stamp and byte size (logical size) of the file. Thus, if rsync reports no further files need to be transferred, then you can trust that.

MrHat avatar
es flag
Thanks for the detailed answer. I will try the delete option just to see if it gives the same result (It should because the source is larger). In any case my files will be used so they will all be accessed and checked but I wanted to know if there was a faster way to check that the transfer worked well. I will provide an update in a few days once I verify that none of the files are missing or corrupted.
vanadium avatar
cn flag
The fastest way is just the report of the `rsync` operation itself. For extreme security, you can have `rsync` compare via a checksum, but that will greatly slow down the synchronisation.
Zaman Oof avatar
in flag
The Important is `if rsync reports no further files need to be transferred, then you can trust that.` thanks
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.