Score:6

Head with a weird behavior

ua flag

I have downloaded a warc file from Common Crawl in Ubuntu 18.04. After decompressing it with gzip, I've tried to get a segment of the file using head. I first tried:

head -c 29 CC-MAIN-20210620114611-20210620144611-00436.warc

It produced the expected result, outputting the first 29 bytes of the file:

WARC/1.0
WARC-Type: warcinfo

But, if instead of 29, I use 30, it produces a result I was not expecting:

head -c 30 CC-MAIN-20210620114611-20210620144611-00436.warc

Output:

WARC/1.0

This is only the first 10 bytes of the file, not the first 30. If I use head -c 31, the result is the expected back again. I have no idea if this is a bug or if there is a detail on how head works that I'm not aware of.

user7761803 avatar
sa flag
If you want to see exactly what's happening, pipe the output to hexdump, with something like `head -c 30 CC-MAIN-20210620114611-20210620144611-00436.warc | hexdump -Cv`
Score:17
hr flag

The head command is almost certainly outputting the requested number of bytes, however what those bytes are is affecting how they are displayed in your terminal.

Specifically, your gunzipped file almost certainly has DOS-style CRLF line endings, with a CR at byte 30 and LF at byte 31. When you do head -c29, the head output excludes both line ending bytes, and you see something like

yourname@computer:~$ head -c29 file.warc
WARC/1.0
WARC-Type: responseyourname@computer:~$

with your shell prompt following directly after the 29th byte. When you do head -c31, you capture both the CR and the LF, and the output looks like

yourname@computer:~$ head -c31 file.warc
WARC/1.0
WARC-Type: response
yourname@computer:~$

However when you do head -c30, the output contains the terminating CR but not its following LF - the cursor is sent back to position 0, but is left on the same line of the terminal, where it is then overwritten by your shell prompt:

yourname@computer:~$ head -c31 file.warc
WARC/1.0
yourname@computer:~$

If the line is longer than your prompt, you will see characters from the file peeking out beyond the end. If your PS1 prompt was empty, then you would have seen the full expected output.

br flag
Or, more practically, just use `head -c 30 filename ; echo`
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.