Score:-2

best filesystem for millions of files

jp flag

Which Linux filesystem/setup would you choose for the best speed in the following scenario:

a few million files ~3mb file size on average random access to files need to get a list of all the files frequently constant writing of new files constant reading of old files

ewwhite avatar
ng flag
More details. Please.
djdomi avatar
za flag
Does this answer your question? [Can you help me with my capacity planning?](https://serverfault.com/questions/384686/can-you-help-me-with-my-capacity-planning)
cl flag
The use case for millions of files in a directory with random access makes me think it would be a better use case for a database. A simple sql database with a primary key and a blob, or a key-value-store database would be basically the same thing as your filesystem, but databases are optimized for that workload.
Score:3
ca flag

What really counts is how you organize your files.

If you plan to have a single big directory with ~10M files any filesystem will suffer, albeit XFS and ZFS will manage even this worst case quite well.

The recommended approach is to organize your files in multiple, smaller directories, with reasonable file counts (~32K) to avoid different but related issues (ie: ls was once very slow for big directories).

If this is not possible I would go with XFS or ZFS but only after having simulated the intended load on a test setup (note: even EXT4 will be fine performance-wise, but you can hit hard the inode limit).

Score:2
kz flag

From what you describe XFS is a proper match. It was created to handle billions of files. You’ll have to think about right back-end storage for what you plan though.

Score:2
cn flag

Your work load is almost the worst possible for a general purpose file system. Millions of files, frequent enumeration, lots of reads and writes. Enormous metadata I/O. With large number of files, it rarely the bandwidth of transferring the file themselves that is the problem, rather the number of IOPS to query directory entries and inodes repeatedly.

Test this workload synthetically, while monitoring the application to be sure performs acceptably. On realistic production scale storage and IOPS levels. Be sure to match the folder structure, 300 files per directory is very different from 3,000,000 files per directory. Try a couple different file systems, for Linux XFS and EXT4.

Possibly you will need very fast SSD storage and lots of RAM to make this perform adequately.

Maybe you have a support contract with your OS vendor where you can have a performance specialist look at it.

If getting acceptable performance demands it, consider application changes. Consider storing and querying the file lists from a database other than the file system. Many databases might be able to return a few million results faster than a file system constrained by POSIX in general and Linux VFS in particular.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.