Score:0

How to minimize filesystem overhead

br flag

I have an application that uses a lot of space as essentially cache data. The more cache available the better the application performs. We're talking hundreds to thousands of TB. The application can regenerate the data on-the-fly if blocks go bad, so my primary goal is to maximize the size available on my filesystem for cache data, and intensely minimize the filesystem overhead.

I'm willing to sacrifice all reliability and flexibility as well as "general-purposeness" requirements. On top of that, I know exactly how many files of cache data I will have on any given volume because the application writes cache files with a fixed size (~100GB in my case). I'd like to be able to overwrite a file with a new one if a block goes bad occasionally, so it might be good to have a few spare inodes lying around, but it is also feasible to reformat the entire volume if needed. The files are all stored 1 directory deep in the filesystem. Directory names can be capped at a single letter, for example, and I don't need the directory either (all files could just as well be stored top-level on the root of the volume). File names are all a fixed size (hash plus timestamp). Once the cache data is written, the files will only ever be read and the volume can be mounted read-only. The cache is valid for a long time (years). The integrity of the cache is also validated by the application, so I don't need any filesystem integrity features like checksums and journling, etc.

So, given that I know the exact, fixed, file size and have no reliability concerns, what filesystem should I use and how should I go about tuning it to eliminate as much overhead as possible?

in flag
Typically programs that use that much disk space often use their own file-system, e.g. like some databases that operate on raw partitions or in one large image file. No body knows better that the program developer which index access and what data is necessary in what order. Therefore the file-system with the smallest overhead is a custom one that just contains the meta data you need.
David Cowden avatar
br flag
The application is rather young and doesn't use its own filesystem (hopefully yet), sadly. So I'm looking for immediate pointers on how to get the most bang for my buck in terms of space efficiency with traditional filesystems. I'm not opposed to writing my own at some point either, just seems a little out of scope at the moment.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.