Score:5

pgrep returns extra processes when piped by other commands

cn flag

Here is a very strange question about using pgrep to search which shell processes are running the same script as the current one.

Here is the test script named test.sh

#!/bin/bash

full_res=`pgrep -a -l -f 'test\.sh'`

res=$(pgrep -a -l -f 'test\.sh' | cat)

echo "short result is $full_res"

echo "weird result is $res"

With output being

sh test.sh &
[1] 19992
➜  logs short result is 19992 sh test.sh
weird result is 19992 sh test.sh
19996 sh test.sh

[1]  + 19992 done       sh test.sh

I don't know where the 19996 sh test.sh comes from, especially when using a pipe to cat. I believe it might be a bug to pgrep implementation.

Looking forward to some reasonable explanation

Thx,

Balin

Score:9
jo flag

When you created the pipeline using backticks or $(...) a subshell is created which is an exact copy of the original bash shell you called.

At the point you're doing the pgrep what you actually have is this:

bash test.sh
  └─bash test.sh
      └─ pgrep -f test.sh
      └─ cat

So pgrep is doing what you asked it to.

You can simulate this behaviour like this.

#!/bin/bash
echo mypid $$
$(sleep 60 | sleep 60 | sleep 60)

Run the process in the background, using the pid it spat out, inspect it with pstree.

$ ./test.bash 
mypid 335153
^Z
[1]+  Stopped                 ./test.bash
$ bg
[1]+ ./test.bash &
$ pstree -p 335153
test.bash(335153)───test.bash(335154)─┬─sleep(335155)
                                      ├─sleep(335156)
                                      └─sleep(335157)
fo flag
I would think that the parent bash has bash child that spawns pgrep and another separate bash child that spawns cat. I think that when pgrep runs, the 2nd bash child has not yet been created.
Matthew Ife avatar
jo flag
@glenn jackman thats not correct. But to illustrate I updated my answer.
ilkkachu avatar
us flag
The parts of a pipeline run in subshells, yes... but that should mean _two_ extra Bash process for a two-part pipe. But there's only one even in that `sleep | sleep | sleep` case. But Bash optimizes subshells with only one process by exec'ing the launched command over the shell process so we don't see those. I _think_ the Bash process actually seen there is due to the subshell involved in the _command substitution_.
ilkkachu avatar
us flag
We don't see it in the original `res=$(pgrep)` case, again because of the optimization with exec. But it appears also with `res=$(pgrep; true)` since the second command there foils the optimization. So it's not just the pipe. You could try and see what the pstree looks like if you change that test case to `$(sleep 60; true)`?
U. Windl avatar
it flag
What the `sleep` example shows is that "process substitution" starts an extra shell, and that once "exec'd" the process title changes to the process being run (thus not four `bash` processes). For the original question it means that `pgrep` finds the forked shell (or the one being used for process substitution) before the child process did `exec`, I guess.
Matthew Ife avatar
jo flag
I should probably note that this is precisely due to the backticks that the subshell is created. I edited the answer to make that clearer.
fo flag
`exec`, of course. Thanks for the thorough answer.
Score:5
fo flag

From Pipelines in the bash manual:

Each command in a multi-command pipeline, where pipes are created, is executed in its own subshell, which is a separate process

Tangentially, this is why this won't work:

date | read theDate
echo "$theDate"

because the read command runs in a subshell, so the theDate variable is populated in the subshell, not in the current shell.

Matthew Ife avatar
jo flag
Just to follow on from this `pgrep -f` is asking it to match `test.sh`. When the subshell is created, its command line is a copy of the original spawning process. So `$(this|that)` results in another process called `test.sh` that is a child of the original `test.sh` process. Hence pgrep matches its parent and its subshell.
U. Windl avatar
it flag
This "answer" is correct by itself, but doesn't contribute to answering the question IMHO.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.