Score:1

awk or sed command to replace line break plus text containing spaces

cn flag

An answer to another question suggests sed -i 's/original/replacement/g' file.txt to replace specific words in a text file. My starting situation looks like this:

        Item: PRF
        Type: File
        Item: AOX
        Type: Folder
        Item: DD4
        Type: File

My ending situation should look like this:

        Item: PRF^Type: File
        Item: AOX^Type: Folder
        Item: DD4^Type: File

Notes: (1) The Ask Ubuntu interface seems to suppress some of the leading spaces before Item: and Type:. There are in fact eight leading spaces. (2) I may have erred in using simplistic examples of Item. The items are actually partial Windows paths (lacking e.g., D:), some of which are quite long. A more accurate example would be Item: Folder\Some Folder\A file name.txt.

I've tried this, with and without double quotes:

sed -i 's/\n"        Type: "/\^"Type: "/g' file.txt

That gives me no errors, but also no changes. Also tried this:

awk '/ "        Item: " / { printf "%s", $0"^" } / "        Type: " / { gsub(/^[ \t]+/,"",$0); print $0 }' source.txt

I tried that to verify that I would be changing only those entries with eight blank spaces before "Item." That didn't work. Trying it with no spaces and no double quotes, as in the answer (below), also failed. Trying it with gawk -i inplace produced source.txt containing zero bytes.

My title initially specified sed. An answer proposing awk alerted me to that alternative, which (now that I'm looking at it) seems more capable. But I cannot figure out how to make it work.

muru avatar
us flag
"but also no changes" .. do you mean changes *in* the file? If you want in-place changes a la `sed -i`, you'd need to use GNU awk with the `-i inplace` option
cn flag
Ah. I thought one answer (below) was saying that GNU awk was the default in Ubuntu. Apparently I misunderstood that: https://askubuntu.com/a/1420570/80644. With `sudo apt install gawk` the `-i inplace` option did modify source.txt, though with undesirable results (see edited question, above).
muru avatar
us flag
I don't remember if it's the default or not, but anyway, the command in the answer works for me, but your post has some weirdness: `/ " Item: " /`, `/ " Type: " /` - these don't match anything in the input file you have shown, so nothing gets printed, so your input file is replaced with nothing.
Raffa avatar
jp flag
*"The items are actually partial Windows paths"* ... Was your input file edited on Windows at some point? ... If yes, then it might have `\r\n` carriage return(*Windows style newlines*) and you need to run it through e.g [`dos2unix file`](https://askubuntu.com/a/1183893) to correct that before processing it with either `sed` or `awk`
muru avatar
us flag
Also see: https://askubuntu.com/editing-help#code for how to code format properly (either indent by 4 spaces or wrap with triple-backticks)
cn flag
The Windows-style newline was the solution. To fix that, I opened the file in `gedit` and used Save As to change the line ending from Windows to Unix\Linux.
Score:1
hr flag

By default, sed only loads one line at a time into its pattern space. You can use the N command to load another line.

In fact, your question is a variant of a well-known "one-liner" for joining lines based on the initial character(s) of the following line1:

40. Append a line to the previous if it starts with an equal sign "=".

sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D'

So given

$ cat file.txt
    Item: PRF
    Type: File
    Item: AOX
    Type: Folder
    Item: DD4
    Type: File

(which has 4 initial spaces), then

$ sed -E -e :a -e '$!N;s/\n {4}Type: (File|Folder)/^Type: \1/; ta' -e 'P;D' file.txt
    Item: PRF^Type: File
    Item: AOX^Type: Folder
    Item: DD4^Type: File

Add -i or -i.bak to edit the file in place once you are happy that it is doing the right thing.


Alternatively, you could use the following non-streaming ed editor script to match the Type: lines, substitute ^ for the leading spaces, then join to the preceding line, writing the result back to the same file:

g/^ \{4\}Type:/s//^Type:/\
-1,.j
wq

You can implement that as a non-interactive shell one-liner:

printf '%s\n' 'g/^ \{4\}Type:/s//^Type:/\' '-1,.j' 'wq' | ed -s file.txt

See The GNU ed line editor for details.


  1. see for example Sed One-Liners Explained, Part I: File Spacing, Numbering and Text Conversion and Substitution
cn flag
In the `sed` command, what does :a do?
hr flag
@RayWoodcock `:a` sets a label for the conditional branch `ta`. See [Commands for sed gurus](https://www.gnu.org/software/sed/manual/sed.html#Programming-Commands)
Score:1
jp flag

I would use awk … It is a straightforward one-liner like so:

awk '/Item:/ { printf "%s", $0"^" } /Type:/ { gsub(/^[ \t]+/,"",$0); print $0 }' file

That is … If the line has Item: in it, then print it without appending a newline(printf doesn't append a newline by default) but append the ^ character at the end … and if the line has Type: in it, then remove all leading space and print it appending a newline(print appends a newline by default).

The above command will not modify the original file but, will rather output modified text in the terminal.

To edit the original file in-place, use the -i inplace option of GNU awk(Might be the default on Ubuntu ... Check with awk -W version) or if not, you can install gawk then use it like so:

gawk -i inplace '/Item:/ { printf "%s", $0"^" } /Type:/ { gsub(/^[ \t]+/,"",$0); print $0 }' file
cn flag
This is a wilderness to me. (1) Would I be better advised to use printf "%s", $0 (see https://stackoverflow.com/a/46455937/711879)? (2) If printf $0 doesn't append a newline in the first part, why does it append a newline in the second part? (3) I think [ \t] refers to any occurrence of space or tab, but what do / and + characters do in gsub(/^[ \t]+/,"",$0)?
Raffa avatar
jp flag
(1) yes `printf "%s", $0"^"` would be a better safety measure … (2) It’s `print`(*not `printf`*) in the second part … (3) `//` enable regular expressions and `+` matches multiple occurrences of the regular expressions inside `[]`
cn flag
Very helpful. Thank you. Clarification on print vs. printf: https://en.wikibooks.org/wiki/An_Awk_Primer/Output_with_print_and_printf. Follow-up question regarding + : doesn't gsub (as distinct from sub) already match multiple occurrences within the specified string - or is that defeated by ^ ? Anyway, I wasn't successful so far. Editing the question to update.
Raffa avatar
jp flag
@RayWoodcock "doesn't gsub (as distinct from sub) already match multiple occurrences within the specified string?" ... It does if you **don't** anchor the regex to the the beginning of the line with `^`(*there will always possibly be **only one** space or tab that satisfies this condition ... hence the `+`*) ... It's worth mentioning that in your case `sub(/^[ \t]+/,"",$0)` is an alternative option too ... Also, please notice that we only see your provided example input and expected output and write our answers to help you achieve just that ... We don't see the other context you see :-)
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.