Score:0

Storage for Millions of audio files with search accessibility (if possible)

us flag

I was looking for solution for this problem that I have with storing of 7tb worth of files , all of them are audio files these files are recordings of asterisk based server freepbx.

At first what I did was installed proxmox on server with ssd for faster access then nextcloud for file access via web ( this server only serves at local network so security is not the issue here ) as I keep uploading the files i realized that this is bad idea as it takes alot of time to search for one specific file and i am only at 2tb of usage , I have been playing around with next cloud for a while and I know I can search via ssh or webdav but it also takes alot of time because these files needs to be accessed regularly with multiple of users.

so I am looking for solution for this problem as I still have 5tb worth of data and I am looking for either web access or any other way to find and retrieve data from storage easily or entirely new OS/web server that can help in storing and accessing the files.

what I have is ssd for boot and 4x4tb drives for storage on raid5 with 2x gigabit lan card on that server for access with proxmox installed and running some virtual machines, data is structured as year>month>date>1000s of files with phone numbeer on there name for identification.

Thank you Best Regards,

vn flag
You'd probably want something like https://opensearch.org/ or https://www.elastic.co/ for this. Or even a hosted solution like https://www.algolia.com/.
Noob with 0 knowledge avatar
us flag
Hello, I looked into opensearch and elastic but those are for data analysis and mostly used for logs and i couldn't find any reference to data storage . I may be wrong because those 2 are uncharted territory for me and there could be a plugin that can help with this but I am still lost on this , would highly appreciate is if you can guide me in the right way .
vn flag
I mention these because most of your questions seems to be around search; "this is bad idea as it takes alot of time to search for one specific file" etc. Storage and searching the files are largely separate issues.
anx avatar
fr flag
anx
Couple Terabytes of Audio files should equate a total file name size that fit typical RAM these days. Are you possibly waiting for the application (*Nextcloud*) to build a file index on demand, and if all files were cached in its database, it would be reasonably fast?
Noob with 0 knowledge avatar
us flag
i am currently using nextcloud and its good until i have to access recordings, for instance i need recording of specific number in search you cant just download you have to open each and specific folder to download all it would be really great if i can just download all at once , and nextcloud do make database for fast access but downloading is pain .
Score:2
in flag

I'm working with tons of audio files too.

The best way I have found to handle this is :

  • Use SSD disks and RAID1 / ZFS mirror to speed up access.
  • Don't deal with files, but work on the filename and metadata : create a simple and light searchable index. A database, ElasticSearch works well but eats RAM, PostgreSQL with indexed fields columns can do the job too.
  • Just use a link to file path when the access is triggered.

My workflow is :

  1. Browse text-based tree and filenames from a simple web page (homemade)
  2. Click to access the file
  3. The web page retrieve the file based on the path and serves it to the user (on the LAN, or through Internet).

Btw, with this kind of volume, it will be interesting to take a look in tools used by datahoarders, such as

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.