I have > 100 million image files (book covers) as a flat list of files under a single "directory":
/images/000000093e7d1825b346e9fc01387c7e449e1ed7
/images/000000574c67d7b8c5726f7cfd7bb1c5b3ae2ddf
/images/0000005ae12097d69208f6548bf600bd7d270a6f
...
A long time ago, these were stored on Amazon S3, and are now on Backblaze B2 (which is S3-compatible).
So far, this worked fine:
- storing a new file is very quick;
- retrieving an existing file is very quick.
I'm in the process of migrating once again, to iDrive E2 (S3-compatible as well).
I'm experimenting with moving them using rclone, but after 30 min of waiting for rclone copy
to start, I realized that rclone does not start transferring files until it has received the whole file list.
The problem is:
- a quick benchmark of
rclone ls
on the /images/
directory tells me that transferring the whole file list would take almost 10 hours
- any problem during transfer (which will take many days) would restart from zero, forcing rclone to download the whole file list again
- listing files costs money with B2
I tried configuring rclone to copy only a batch of files:
rclone copy "backblaze:/images/0000*"
, with or without *
, does not find any file
rclone copy "backblaze:/images/" --include "/0000*"
seems to download the whole file list as well, and filter on the client
Strangely, it looks like rclone has no problem retrieving from the server a list of files that are under a given "directory", for example /images/
, but cannot do the same with a prefix, such as /images/0000
.
I thought that S3, and by extension all S3-compatible storages, stored file paths as a flat structure, and that /
was just a character like any other, and that you could easily list files under any prefix, ending or not with a /
.
Am I mistaken?
I my next storage (E2), should I store files under sub-directories, such as images/0/0/0/0/
, images/0/0/0/1
, etc., just like we did in the good old days of storing files in a traditional filesystem?