Score:3

"just in time" filesystem using inotifywait and mkfifo

ru flag

I am writing some gross middleware - basically, I have some old code that needs to open 100,000 files for reading only, expecting them all to be in one folder. It never writes. It is multiprocess so it can try to open ~30 files at the same time. The old way, I would have to actually copy the files into that folder (or use links, NFS, etc.). Worth noting I have no ability to change this old code - its just a binary.

I have some new, fancy code that can retrieve a file almost instantly. I want to tie these things together, so when the old code tries to open the file, it is actually, in real time, running the new code.

So I thought of mkfifo and inotifywait. Instead of a folder of 100,000 files, I can make a folder of 100,000 named pipes. So far so good. The legacy code goes to open the files, not knowing that they are indeed named pipes. The problem is, I don't know what order the legacy code is going to open the files (nice, right?). So I would like to TRIGGER the named pipe WRITE (from my fancy new code) when the legacy code goes in for the read. I can't spawn 100,000 writes and have them all block. So I thought hey - inotifywait makes sense. Every time the legacy goes to open the pipe, it triggers a read event, which can then be used to spawn the pipe writer in the background. The problem is.. inotifywait doesn't trigger the read event until AFTER the writer has been spawned!

Any ideas of how to solve this? Basically - I want to intercept a file open, block for a couple hundred ms while I retrieve the contents of the file, then return that contents. Ideally I don't have to create a custom FUSE filesystem to do this.. its just a read-only file open. The problem is this needs to run fast and in parallel.. and I don't know which files are going to be opened in what order. Gotta be a quick and dirty way!

Thanks in advance for everyone's time.

EDIT - for some more details. Basically I have legacy some code that wants to load a folder full of PNG files. I want those PNG files to actually come from a web server that returns DICOM files. This requires some ugly conversion, etc. The legacy PNG loading code is very inflexible.. it expects these things to be files. So basically, I want to intercept the fopen of the PNG loading code and run the following four lines of bash pseudocode first. The $URL_FOR_DICOM below can be derived from the $LADY_LOADED.png filename.

wget -q -O $LAZY_LOADED.dcm $URL_FOR_DICOM
dcmj2pnm --write-png $LAZY_LOADED.dcm $LAZY_LOADED.png
rm $LAZY_LOADED.dcm
convert $LAZY_LOADED.png -resize 1024x1024^ -gravity center -extent 1024x1024 $LAZY_LOADED.png

So when the PNG loader tries to load $LAZY_LOADED.png (which is actually a FIFO), it would get populated using the above, ideally triggered by inotify. I can't do this in advance because the dataset is massive - like close to 0.5PB.. so I can't have a second copy around, I need it to be loaded on the fly from the web server.

EDIT 2- when trying ifnotifywait on a named pipe, it blocks ANY events (including open, access, read.. etc) until the named pipe is open for writing AND reading... (i.e. no way to detect that the reader is ready)... ideas? Another user had a similar problem here with no solution : (

Score:3
ca flag

If you need it to be "fast and parallel", please try very hard to not reinvent the wheel: the in-kernel filesystem and pagecache are specifically tuned to be very fast, much faster than your custom "filesystem in userspace".

You have multiple better options:

  • copy the file in a temporary location;
  • even better, rather the copying them, use soft or hard links;
  • export them via a read-only NFS mounts

Finally, please note that inotify is a best effort framework to deliver file events. Under heavy load notifications can be lost, especially if using an high-level language binding (ie: python).

EDIT: so you want to intercept reads for on-the-fly file conversion. When reading from an empty pipe your legacy code will block, so inotifywait will show the READ event only after you wrote something to the very same pipe. To avoid the issue you should try to listen for OPEN events (rather than READ). Another option is to use LD_PRELOAD to replace the classical open() syscall with a custom version which will convert the file as required.

ru flag
Thank you for the super fast reply - the issue I am having is.. the source are not files. The source is a web service that returns images, which need to be converted, resized, and cropped. There is no way I have enough space to copy everything (300 terabytes in reality).. so needs to be done "just in time". Ideally I could just sym link to the web service's source directory.. but.. those are on a different machine and also in a vendor-proprietary binary format only accessible by the web service. Any ideas??
ru flag
Added some more info above in the main question - thank you again
shodanshok avatar
ca flag
@mohotmoz I've edited my answer adding additional details.
ru flag
thank you so much! so when I try listening for ALL events.. nothing comes up until after I open the pipe on the other end. when I try just OPEN events.. still doesn't work : ( nothing until after I open the pipe on the other end and start writing. Someone else had this problem: https://stackoverflow.com/questions/67639932/how-can-i-use-inotify-to-tell-when-a-named-pipe-is-opened
shodanshok avatar
ca flag
@mohotmoz well, so it seems that inotify does not work on an unconnected pipe - which is a reasonable outcome. If so, and assuming you can not modify the legacy code, you need to open (and keep opened) your pipe end or overload the open() function via LD_PRELOAD
Nonny Moose avatar
gb flag
You presented the LD_PRELOAD idea like an afterthought, but it might be the way to go here. Definitely much easier than trying to do just-in-time copying.
Score:0
US flag
OK - so not ideal but I hope it helps:

This seems to work (perhaps with threading issues but fine for my use case?).

$ echo hello > hello.txt
$ mknod tmp p
$ while true; do  cat /dev/null > tmp; done &
$ (inotifywait -m -e open ./tmp | while read f; do cat hello.txt > tmp; done) &


$ cat tmp
hello
$ cat tmp
hello
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.