Score:0

When piping grep after a curl request, regex works very strange

in flag
Cas

Simple problem but really weird.

When I make a curl request and do ... | grep -Po "^\d+$", it returns nothing eventhough there are 400+ results that should match. See below:

#example snippet of curl output
107
00:15:54,936 --> 00:15:56,646
Yeah, this is him.
We got him.

108
00:16:07,823 --> 00:16:11,869
So, how long
you been in South Florida?

109
00:16:11,953 --> 00:16:13,871
A while.
Before that?

110
00:16:17,166 --> 00:16:20,253
We know you're Brian O'Conner,
formerly of the LAPD.

111
00:16:21,128 --> 00:16:23,214
- You got the wrong guy.
- Really?

112
00:16:28,177 --> 00:16:29,929
How you doing, O'Conner?

So that's a part of the curl output. The complete output goes to 1000+ in this case (here 107-112). Now:

$ [curl request] | grep -Po "^\d+$"
[empty response]

$ [curl request] | grep -Po "^\d+"
[shit ton of results]

And I truely don't get it. I also tried to add -a to grep but that also didn't help.

Why doesn't the $ work? And a better question: why does it make every hit invalid (aka nothing matches)?

EDIT: xxd output from curl snippet above

00000000: 3130 370d 0a30 303a 3135 3a35 342c 3933  107..00:15:54,93
00000010: 3620 2d2d 3e20 3030 3a31 353a 3536 2c36  6 --> 00:15:56,6
00000020: 3436 0d0a 5965 6168 2c20 7468 6973 2069  46..Yeah, this i
00000030: 7320 6869 6d2e 0d0a 5765 2067 6f74 2068  s him...We got h
00000040: 696d 2e0d 0a0d 0a31 3038 0d0a 3030 3a31  im.....108..00:1
00000050: 363a 3037 2c38 3233 202d 2d3e 2030 303a  6:07,823 --> 00:
00000060: 3136 3a31 312c 3836 390d 0a53 6f2c 2068  16:11,869..So, h
00000070: 6f77 206c 6f6e 670d 0a79 6f75 2062 6565  ow long..you bee
00000080: 6e20 696e 2053 6f75 7468 2046 6c6f 7269  n in South Flori
00000090: 6461 3f0d 0a0d 0a31 3039 0d0a 3030 3a31  da?....109..00:1
000000a0: 363a 3131 2c39 3533 202d 2d3e 2030 303a  6:11,953 --> 00:       
000000b0: 3136 3a31 332c 3837 310d 0a41 2077 6869  16:13,871..A whi
000000c0: 6c65 2e0d 0a42 6566 6f72 6520 7468 6174  le...Before that
000000d0: 3f0d 0a0d 0a31 3130 0d0a 3030 3a31 363a  ?....110..00:16:
000000e0: 3137 2c31 3636 202d 2d3e 2030 303a 3136  17,166 --> 00:16       
000000f0: 3a32 302c 3235 330d 0a57 6520 6b6e 6f77  :20,253..We know
00000100: 2079 6f75 2772 6520 4272 6961 6e20 4f27   you're Brian O'
00000110: 436f 6e6e 6572 2c0d 0a66 6f72 6d65 726c  Conner,..formerl
00000120: 7920 6f66 2074 6865 204c 4150 442e 0d0a  y of the LAPD...
00000130: 0d0a 3131 310d 0a30 303a 3136 3a32 312c  ..111..00:16:21,
00000140: 3132 3820 2d2d 3e20 3030 3a31 363a 3233  128 --> 00:16:23
00000150: 2c32 3134 0d0a 2d20 596f 7520 676f 7420  ,214..- You got
00000160: 7468 6520 7772 6f6e 6720 6775 792e 0d0a  the wrong guy...
00000170: 2d20 5265 616c 6c79 3f0d 0a0d 0a31 3132  - Really?....112
00000180: 0d0a 3030 3a31 363a 3238 2c31 3737 202d  ..00:16:28,177 -
00000190: 2d3e 2030 303a 3136 3a32 392c 3932 390d  -> 00:16:29,929.
000001a0: 0a48 6f77 2079 6f75 2064 6f69 6e67 2c20  .How you doing,
000001b0: 4f27 436f 6e6e 6572 3f0d 0a              O'Conner?..
hr flag
Is the curl output CRLF terminated rather than LF? try `grep -Po "^\d+\r$"`
Cas avatar
in flag
Cas
@steeldriver it does give a response but they're all empty lines. So `^\d+$` gives nothing. `^\d+` gives results but not what I want. `^\d+\r$` gives a lot of empty lines. Comparable with `printf "\n\n\n\n\n\n etc.etc."`
hr flag
Well it's hard to diagnose without seeing the actual curl output - can you pipe a small section to `cat -A` or `xxd` and [edit] it into your question so that we can see it byte by byte?
Cas avatar
in flag
Cas
When piping to cat -A, i just see everything normal aside from the fact that everything ends with ^M$
hr flag
OK so the problem **is** the carriage returns (that's what the `^M` represents) however the `grep -Po` output gets messed up if you simply match the `\r$` ending as I originally suggested (I don't know why - you can confirm that it's outputting the right thing by piping the grep output through `cat -A`)
hr flag
... OK so it seem like there is an interaction between the `\r` and color codes - it works for me if I use `grep --color=never -Po "^\d+\r$"`. However a better solution is probably to convert the curl output to Unix-style `LF` line endings.
Cas avatar
in flag
Cas
`grep --color=never -Po "^\d+$"` worked. However, I think there should be an easier solution right? Isn't there something I could do at the curl command to alter the output there? Or pipe the output through a command and after that, everything will work normaly? Or is this really it?
Score:0
hr flag

Your curl command output has DOS-style CRLF line endings - so the lines you are seeking don't end with \d+, they end with \d+\r

You can change your grep command to grep -Po "^\d+\r$" - this will match what you are looking for, but the output will include the carriage return characters. With colored output (i.e. when grep is aliased to grep --color=auto and output goes to a terminal) the CR causes the output to be overtyped by color code characters so that it appears empty. If you're piping or redirecting the output, this may not be an issue. Otherwise some options are:

  • pipe the curl output through tr to remove the carriage returns ex.

     curl ... | tr -d '\r' | grep -Po "^\d+$"
    
  • change the RE to match but not include the CR using a Perl lookahead

     curl ... | grep -Po "^\d+(?=\r$)"
    
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.