Score:1

Couldn't able to store a output of awk command

nz flag

I have been trying to print git clone progress on more minimalistic way for my project.

Aim

Instead of printing a whole git clone output on screen

remote: Enumerating objects: 1845678, done.        
remote: Counting objects: 100% (503/503), done.        
remote: Compressing objects: 100% (79/79), done.        
Receiving objects:  28% (54112/1845678), 10.10 MiB | 2.00 MiB/s

I want to abstract the lengthy lines of git output and just want to output the realtime progress of clone in the given below format

Cloning [$percentage]

What I have got so far

git clone --progress https://somerepo 2>&1 |tee gitclone.file | tr \\r \\n | total="$(awk '/Receiving objects/{print $3}')" | echo "$total"

Note: Since git clone only returns to stderr stream, I have redirected it to stdout stream. Even with the redirection I faced few issues, so I used progress option on git command.

I wanted to store output on the file (for debugging script) without disturbing stdout stream, so I used tee command. Since git clone returns \r instead of \n, I have replaced it to capture the output in proper manner. For more info on this part you can take a look at this question and its answer Git produces output in realtime to a file but I'm unable to echo it out in realtime directly in a while loop

Then I pick a line which has the keyword Receiving objects and print/store third keyfield value of that line.

What is my problem

My command is working fine if I am not storing output of awk and just printing it on screen:

git clone --progress https://somerepo 2>&1 |tee gitclone.file | tr \\r \\n | awk '/Receiving objects/{print $3}'

But, I am unable to store the awk output in a shell variable and echo it back:

git clone --progress https://somerepo 2>&1 |tee gitclone.file | tr \\r \\n | total="$(awk '/Receiving objects/{print $3}')" | echo "$total"

So what could be a possible solution for this issue?

Score:1
cn flag

As the bash manual says:

Each command in a pipeline is executed as a separate process (i.e., in a subshell).

So, the output saved in the total variable is lost when the sub-shell exits. You can see this if you run this:

git clone --progress https://somerepo |& tee gitclone.file \
| tr \\r \\n | { total="$(awk '/Receiving objects/{print $3}')" ; \
 echo "$total" ; }

Since the variable total is lost after the above command line (i.e. pipe of commands) is finished, you should put the whole line into the "command substitution" parentheses like this:

total=$(git clone --progress https://somerepo |& tee gitclone.file | tr \\r \\n | awk '/Receiving objects/{print $3}')
echo "$total"

However, if you want the pipeline (starting with the git command) to be run in the background, then you have to redirect awk's output to a file and later read that file. For example:

tmpfile=$(mktemp)
git ... >"$tmpfile" &
# ...
# Do other stuff...
# ...
wait # for background process to complete.
total=$(cat "$tmpfile")
rm "$tmpfile"
echo "$total"

A hint: To redirect stdout and stderr of the git command to the tee command you can use the |& shorthand like this: git clone --progress https://somerepo |& tee gitclone.file | ...

Eswar Reddy avatar
nz flag
Hello, Thanks for time and efforts. But i am still not getting the output through total variable.Since i need to run perform this operation i have ran ur suggested command in background like below. total=$(git clone --progress https://somerepo 2>&1 |tee gitclone.file | tr \\r \\n | awk '/Receiving objects/{print $3}')& echo "$total" So is there any other way we i could try.
muru avatar
us flag
@EswarReddy remove the `&` - if you send it to the background, the whole thing will be done in a subshell and the parent shell won't see changes in the variable.
Eswar Reddy avatar
nz flag
Thanks for the suggestion muru.I totally agree with you. But if i run this command on foreground i won't get terminal until the clone is completed right?. I need to do clone and print the progress on the same terminal. So can i pipe the echo command. like this total=$(git clone --progress https://somerepo 2>&1 |tee gitclone.file | tr \\r \\n | awk '/Receiving objects/{print $3}')|echo "$total
FedKad avatar
cn flag
Your requirement is not clear: You need to run the command in the background, but obtain the value in `total` ___when___? You cannot do this as long as the command is not finished. Please, [edit] your question and make it more clear.
Eswar Reddy avatar
nz flag
Hello Fed, I have editied my question aim part.Please havea look at it
Score:0
us flag

I think that the problem is with git's output. I does not complete new lines while rewriting the "Receiving objects:" line.

You can tell this is the case by looking at the output of

GIT_FLUSH=1 git clone --progress $repo 2>&1 | cat -bu

You will not see line numbers after the first occurance of the "Receiving" line. Here is an example where i pipe the output into "od" to make the \r and \n visible:

0000200                   \n                       4  \t   R   e   c   e
0000220    i   v   i   n   g       o   b   j   e   c   t   s   :        
0000240        0   %       (   1   /   1   1   0   3   8   )  \r   R   e
0000260    c   e   i   v   i   n   g       o   b   j   e   c   t   s   :
0000300                0   %       (   4   9   /   1   1   0   3   8   )
0000320    ,       8   .   8   8       M   i   B       |       2   .   8
0000340    4       M   i   B   /   s  \r 

A program that reads input line by line (like awk) will not see those lines until git is finished.

Eswar Reddy avatar
nz flag
Ya....if possible, can you suggest me a possible workaround.
neuhaus avatar
us flag
You could try changing RS (the input record separator in AWK) from newline to carriage return: `BEGIN { RS="\r" }`
Score:0
af flag

You've fundamentally got a pipeline buffering issue. The input and/or output buffers used by the programs in the pipeline are too big. Fortunately there is a way to tell each program in the pipeline to buffer only one line.

This is the program you need: https://manpages.ubuntu.com/manpages/bionic/man1/unbuffer.1.html.

It's installed by default in Ubuntu Desktop, I think, but if not:

sudo apt install expect

Then you can include the unbuffer command in your pipeline to solve the problem:

REPO_URL = https://something or git@something
unbuffer git clone --progress $REPO_URL 2>&1 | \
  unbuffer  -p tr \\r \\n | \
  { awk '/Receiving objects/{print $3}' ;  echo "$total" ; }

It prints 0%, 1%, ...100%, then because "total" is the last of those, prints 100% again, and it does so as the progress progresses, not all at the end or in large chunks.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.