Score:1

What accounts for mystery hard drive space gone missing? (Difference between df and du)

sd flag

To preface: there are lots of other useful questions (e.g. this and this) on possible causes for different sizes reported by df and du. None of the explanations apply to my exceedingly simple case, however, hence this new question.

I have a very simple scenario: I have two identical 5 TB Seagate hard drives purchased at the same time (a few months ago), with their original NTFS formatting. Hard drive A is full of a few thousand mostly large files (gigabyte-sized), and FreeFileSync is used to mirror drive A to drive B on a nightly basis.

Already from the first time it was mirrored, I discovered the files on drive B took up nearly 3% more space than on drive A, and this has continued through now (a few months later). With identical files on both, df reports (in 512 B blocks):

Filesystem   512-blocks       Used Available Capacity  iused      ifree %iused  Mounted on
/dev/disk4s2 9767276536 8946736496 820540040    92%     6149 4294961146    0%   /Volumes/A
/dev/disk5s2 9767276536 9199664896 567611640    95%     5719 4294961576    0%   /Volumes/B

While du -d 0 in the root of each drive reports (again in 512 B blocks) only a 0.002% difference:

A: 8939999664
B: 8940229723

So I'm trying to figure out what could possibly be possibly be resulting in 3% less available space on drive B -- a difference of 121 GB across these two 5 TB drives.

I've ruled out every suggestion I've found elsewhere -- it's not an issue of file fragmentation since du shows similar block usage, there are no symlinks or hard links of any kind, no volumes mounted on either, no hidden logs, I'm not running out of inodes, I've run du as root, no files marked for deletion still with open handles, the root .Trash folder is empty on both. I've read du doesn't count blocks used by directories themselves and other filesystem data, but I can't see how that would add up to 121 GB of missing space -- plus the directories are obviously identical between drives, and it's only around a thousand directories total. When I verify the filesystems, both disks show no errors. I wonder if the issue could be bad blocks, but I can't seem to find any references on how to detect if a filesystem is already compensating for that. These disks are fairly new, also, and the discrepancy has existed from day 1.

The issue is of immediate importance because when disk A gets nearly full as files are added, mirroring fails because disk B runs out of space first. I've "solved" it for the time being by using disk B for writing and disk A for mirroring to avoid that problem, but I'd still like to understand what could possibly be using the mystery 121 GB of space.

mr.zog avatar
at flag
Is this a Linux or a Windows system?
jm flag
The number of inodes used is different on the two volumes so the file/directory count is different on the volumes. Have you checked `$Recycle.Bin`, `System Volume Information`, and `thumbs.db` for differences?
sd flag
Yes -- the discrepancy is caused by a ton of .DS_Store files on drive A, created because it's accessed by Macs. FreeFileSync excludes those from the mirror, which is why drive B has a slightly lower file/inode count.
sd flag
Adding to previous comment: And that can't be the explanation because it's drive B that is using the greater amount of space -- the drive that has *less* inodes, that *doesn't* have the extra .DS_Store files. And yes, I've checked all hidden files in the root and a sampling of other directories -- things like `.Trashes`, `.ReadyDLNA`, etc. are essentially empty and are counted by `du` anyways.
sd flag
I'm on a Mac. The drives are normally connected to a router and accessed via SMB, but I've also tried connecting them directly to my Mac and the results are identical. Since Macs have only built-in supporting for reading (not writing) NTFS drives, I'm using the Paragon Software "NTFS for Mac" that comes with the Seagate drives which works as a full NTFS driver.
in flag
Compare the cluster size of both filesystems.
Brandon Xavier avatar
us flag
You may consider checking for sparse files on disk A. I'm honestly not sure how they are implemented on NTFS, but on native Linux file systems these can easily expand when copied (by a sparse-unaware utility). This link might be useful: https://www.thegeekdiary.com/how-to-find-all-the-sparse-file-in-linux/ (I haven't tested the suggested find command)
sd flag
Responses to above: there are no sparse files, another good idea though. And in terms of cluster size, `diskutil` reports identical values for both volumes -- `Device Block Size: 512 Bytes` and `Allocation Block Size: 4096 Bytes` -- from my understanding, Allocation Block Size is cluster size. And indeed, creating a single-byte file on both volumes results in usage of 8 (512 B-sized) blocks according to `du`, which is 4K. So definitely matching 4K cluster sizes on both disks. (And 4K is the MSFS default anyways it seems.)
sd flag
Oh and just excluding two more things: since these are accessed with Macs, Spotlight isn't the culprit either -- it's disabled and there's no hidden `.Spotlight-V100` directory either. Nor is there any Time Machine data except a 0-size root file that disables Time Machine.
in flag
What is your [symlink handling setting in FFS](https://freefilesync.org/manual.php?topic=comparison-settings)? `Follow` or `Direct`?
in flag
What are your synchronization settings? Is it really set to `Mirror`, or maybe `Update` instead? What is the `Delete files` option set to?
sd flag
Thanks, there are no symlinks on the drives (and "including symbolic links" is unchecked anyways as is default), it's definitely set to "Mirror", and "Delete files" is "Permanent". I don't believe any of those could be the issue since `du` returns virtually identical block counts and includes hidden files.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.