Score:4

How to create an alias for a "cat" file usable in a path

pg flag

I have two huge files (150G each) and I need to use a tool for which I should supply them as a single file (since the tool only accepts one file). However, I do not want to merge these files for several reasons, but I cannot pipe them using something like <(cat file1 file2) or myfile=$(cat file1 file2) because the script uses the path of the input file, not its content.

So I would need something like the following:

alias myfile = "cat file1 file2"

So that using the following command would work:

tool_x --file /path/myfile 

I already tried this mentioned command, but it didn't work.

I would just need to be able to treat the result of a "cat" command as an actual file, with the possibility to accessing this file using a path.

Is it possible to achieve something like that?

FedKad avatar
cn flag
That looks like an XY problem: https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem
Martin Thornton avatar
cn flag
The answer depends on what `tool_x` is. That `<(cat file1 file2)` is a named pipe and sometimes works. Did it give an error?
ashenflower avatar
pg flag
Hello @Raffa, thank you for your answer. Unfortunately, I would not know how to explain it better, or which sample to bring. I would just need to be able to threat the result of a "cat" command as an actual file, with the possibility to access this file using a path.
ashenflower avatar
pg flag
@MartinThornton Thank you for your answer. It raise an error since the seconmd tool (called in the main tool) try to access the file using the path of the (supposed) file, and not directly using the output of `<(cat file1 file2)`
Martin Thornton avatar
cn flag
What was the error, *exactly*. What else does the first tool do with the file. What is the second tool called. These all affect the answer. Not everything can use named pipes. You may just have to create a temporary file.
za flag
You can't have blanks around an assignment equals in bash, not in front and not behind.
be flag
`<(...)` is a process substitution, not (necessarily) a named pipe. It's *implemented* using either `/dev/fd` or a named pipe, and I seem to recall named pipes are used *only* if `/dev/fd` is not available.
Score:17
us flag

You could use a named pipe:

mkfifo /path/myfile
cat file1 file2 > /path/myfile &

Here, either the cat command has to be sent to the background, or you can run tool_x in another terminal, as cat will block until something starts reading from the pipe:

tool_x --file /path/myfile

This is essentially what process substitution is doing automatically for you.

ashenflower avatar
pg flag
Thank you, this also works perfectly.
muru avatar
us flag
@ashenflower just checking - you said you didn't want to merge the two files (presumably because of the size?). Was that not a real requirement?
ashenflower avatar
pg flag
I do not want to merge them mainly because, except for the current tool that I am using right now, the other tools that I am using on the same data need them to be supplied separately. So merging those files into one would force me to separate them again later, which is time consuming. If you are familiar with bioinformatics, I am working with [paired-end read](shttps://thesequencingcenter.com/knowledge-base/what-are-paired-end-reads/).
Eric Duminil avatar
us flag
Looks interesting. I've never used it. Myfile is never actually written, and it is just read line by line, when needed?
muru avatar
us flag
@EricDuminil effectively, yes. There's a buffer associated with pipe in memory. The writer can write until the buffer is full, and then it blocks. Something else reading from the buffer causes it to empty, and unblocks the writer of the pipe. Lines don't really come into play.
Eric Duminil avatar
us flag
@muru: Cool, thanks. So the advantage is that only a few bytes are required on the disk (and not 300GB), but the file can only be read once?
muru avatar
us flag
@EricDuminil yes, just enough for the directory entry. Well, you can write to the pipe as long as someone's reading from it, but the problem is that you can't seek in the file (read to one position and jump to another one). The reading has to be sequential.
Peter Cordes avatar
fr flag
@EricDuminil: It's exactly like a regular pipe like you'd get from `cat | tool_x` (except for what file descriptors it's opened as), with the named pipe acting as a rendezvous for two unrelated processes to get file descriptors to the pipe. (A buffer in the kernel). Via `open(2)` system calls, instead instead of the `pipe(2)` system call that gets fds for both ends in one process, and then typically forks+execs.
Peter Cordes avatar
fr flag
@ashenflower: If you're using a filesystem that does reflinks / copy-on-write (like BTRFS), another way to go about this is `cp --reflink=always file1 merged` and then append a short `file2` like `cat file2 >> merged`. It will look like two fully separate files at the FS level, but on disk only the `file2 `data will actually be duplicated. This is useful *if* being seekable is helpful, and file2 is short. (If you want to use different `file2`s, `dd` can overwrite in place, or `truncate` and append again.)
Score:4
jp flag

You can use a temporary file with mktemp like so:

myfile="$(mktemp)"
cat file1 file2 > "$myfile"
tool_x --file "$myfile"

Where $myfile will expand to an actual path like /tmp/tmp.Tg9Epuetsr

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.