My ext4 filesystem loses performance when growing.
I have a system storing a lot of image files. This Debian based image server stores image files divided in year folders on 1-2TB disk sets with hardware RAID-1.The files is stored in a structure of year folders and two levels of 256 folders below that.
Like
images/2021/2b/0f/193528211006081503835.tif
The are files are written continuously during the year and are evenly distributed by help of a hash so each leaf/image folder
contains around 400 files at the end of the year.
This gives a total of around 256 x 256 x 400 = 26 214 400 files per year folder.
Iterating this folder structure works well up until approximately 20 million files.
It takes maybe a few hours. When growing larger even listing a leaf folder with 300-400 files can take 1-4 seconds when not in cache. I suspect it has something to do with fragmentation in the directory entries.
Accessing an individual file when you know the path is always fast.
And it is not a hardware/disk issue, the raw io performance is good. By the way, files are never deleted from this structure.
Defrag with e4defrag makes no difference. I suppose it only defrags files and not directories. fsck.ext4 -D might be a solution, but as this is a production system, I'm not keen on unmounting the filesystem and try.
What does help, is copying the files to a temporary folder and then moving them back overwriting the original. Like
cp -a images/2021/2b/0f/* images/2021/2b/tmp
mv -f images/2021/2b/tmp/* images/2021/2b/0f
After this operation performance is restored (even if not in cache).
If the files themselves were fragmented I understand why this would help, but they aren't according to e4defrag. Moving the files to temp folder and back does not help.
Can some one help me understand what is happening here.