Score:6

How to check for a specific string with linebreaks in a file with grep?

cn flag

I have a string variable in a bash script file as follows:

string="

test1

test2

"

and I want to check whether a file test.txt containts this specific string (including the linebreaks. i.e. it should fail if it only contains the following:

this is a test:
test1

test2
and another one

because the linebreaks above test1 and below test2 aren't present.

(The reason I want to check this is because I want to check whether a certain piece of code is in a source file, and if not, add it.)


The following doesn't work:

string="
    
    test1
    
    test2
    
    "
if ! grep -q string "test.txt"; then
    echo "$string" >> test.txt
fi

This correctly adds the string to the file, but it does it even if the string has already been added. Also, it performs correctly when I change the string to have no linebreaks.


EDIT:

The answers by @terdon and @steeldriver below work for the string example I wrote above, but they for some reason break for this more realistic example:

string="                                                                
                                                               
if [ -f ~/.script ]; then                            
        . ~/.script         
fi

"  
user56834 avatar
cn flag
@Terrance, sorry disregard my previous comment. It actually still doesn't work, but the failure is the opposite: now it doesn't ever adjust the file, even if the string is not there in the first place. (So if I execute it 5 times, rather than ending up with 5 copies as I did with my original code, I end up with 0, whereas I should end up with 1).
terdon avatar
cn flag
Well yes. That is a completely different situation, you're using all sorts of special characters. Please [edit] your question and add i) exactly what you are doing, which approach you are using; ii) how you are calling your script and iii) what error you get (telling us it breaks doesn't help us understand).
user56834 avatar
cn flag
@terdon, sorry yes, my message wasn't very clear. i) i used both your appraoch and @steeldiver 's. E.g. from your approach I only changed the definition of `string` ii) I'm calling it with "bash substtest.sh", and iii) it doesn't give an error, rather it adds the string text indefinitely if I call bash substtest.sh over and over again, rather than just adding it once.
terdon avatar
cn flag
What command are you running that fails? How did you adapt my answer to fit your actual data? This is a completely different situation to your original question. The "string" you are looking for contains special characters. You would need something like `string='\n\nif \[ -f ~/.script \]; then\s*\n\s*\. ~/\.script\s*\nfi\n\n'`.
terdon avatar
cn flag
See updated answer.
Score:6
cn flag

The problem is that grep will run on each line, not the entire file. As long as the file is small enough to fit into memory (which should be the case in the vast majority of situations these days), you can use grep's -z flag to slurp the entire file:

-z, --null-data Treat input and output data as sequences of lines, each terminated by a zero byte (the ASCII NUL character) instead of a newline. Like the -Z or --null option, this option can be used with commands like sort -z to process arbitrary file names.

The next issue, is that if you pass grep something with newlines, it will treat it as a list of patterns to grep for:

$ string="1
> 2"

$ seq 10 | grep "$string"
1
2
10
"

Which means that I am afraid you will have to express the pattern as a proper regular expression:

\n\ntest1\n\ntest2\n\n

However, this also means you need the -P flag to enable perl-compatible regular expressions so the \n will work.

I created these two files to demonstrate:

$ cat file1
this is a test:
test1

test2
and another one

$ cat file2
this is a test:

test1

test2

and another one

Using those two files and the information above, you can do:

$ grep -Pz '\n\ntest1\n\ntest2\n\n' file1
$ 

$ grep -Pz '\n\ntest1\n\ntest2\n\n' file2
this is a test:

test1

test2

and another one

Putting all this together gives us:

string='\n\ntest1\n\ntest2\n\n'
if ! grep -Pzq "$string" test.txt; then
    printf "$string" >> test.txt
fi

Or, as suggested by @steeldriver in a comment, you can use a variable and convert the newlines to \n on the fly:

$ string="

    test1

    test2

    "
$ if ! grep -Pzq "${string//$'\n'/\\n}" test.txt; then
    printf "$string" >> test.txt
fi

If your string contains special characters which have meanings in regular expressions, as you now show in your updated question, then that's a whole different situation. For the example you show, you would need something considerably more complicated. Like this:

searchString='\n\nif \[ -f ~/.script \]; then\s*\n\s*\.\s+~/\.script\s*\nfi\n\n'
printString='
if [ -f ~/.script ]; then
   . ~/.script         
fi

'
if ! grep -Pzq "$searchString" test.txt; then     
    printf "%s" "$printString" >> test.txt 
fi
user56834 avatar
cn flag
Thanks! I assume you mean `if ! grep -q -z "$string" "test.txt"; then`, i.e. with the -z added?
user56834 avatar
cn flag
Actually, even adding the -z, the same problem persists for me as I stated in the comment to my original question: That is, with either `if ! grep -q -z "$string" "test.txt"; then` or `if ! grep -q "$string" "test.txt"; then` or `if ! grep -q -z "$string" test.txt; then`, it fails in a rather weird way:
terdon avatar
cn flag
@user56834 whoops, yes. But this won't actually work with a variable. Give me a few minutes, I'm trying to figure out the problem.
Terrance avatar
id flag
Coolio! +1 Having spaces in the string as `string='\n\n test1\n\n test2\n\n'` works just as well. :)
terdon avatar
cn flag
@user56834 please see updated answer.
terdon avatar
cn flag
@steeldriver duh! Thanks, I could have sworn I did. But no, I just tested it in a terminal and forgot. Fixed now, thanks.
user56834 avatar
cn flag
Sorry for delay, and thanks! Regarding the suggestion by steeldriver, weirdly I get an error: "substtest.sh 12: Bad substitution"
user56834 avatar
cn flag
Oh nevermind, it seems that this is solved by executing the .sh script with bash instead of sh (dash). Not sure why. It works! great. (Although I don't get the "//$'\n'/\\n}" part. Is there a good explanation of this?)
terdon avatar
cn flag
@user56834 `dash` and `sh` are _not_ `bash` and should not be considered synonymous. Dash is a minimal POSIX shell and lacks many of the features of the more sophisticated `bash` shell. Same goes for `sh`. As for the `"${string//$'\n'/\\n}"`, that's a (bash-specific) substitution. The general format is `${var//old/new}` which will replace all occurrences of `old` with `new` in the variable `$var`. Here, "old" is `$'\n'` which is a way of passing a newline to the shell.
user56834 avatar
cn flag
Actually, I just tried the same thing on a more complicated case, and this makes it break. See my original question.
Score:4
hr flag

You might want to consider using pcregrep with the -M or --multiline option to allow matching of literal newlines:

   -M, --multiline
             Allow patterns to match more than one line. When this  option
             is given, patterns may usefully contain literal newline char‐
             acters and internal occurrences of ^ and  $  characters.

Ex. given

$ cat test.txt
this is a test:
test1

test2
and another one


    test1

    test2
    
    

and

$ cat test2.txt
this is a test:
test1

test2
and another one


    test3

    test4
    
    

with

$ string="

    test1

    test2

    "

then

$ pcregrep -qM "$string" test.txt && echo 'found' || echo 'not found'
found

$ pcregrep -qM "$string" test2.txt && echo 'found' || echo 'not found'
not found
user56834 avatar
cn flag
Thanks, this works. Unfortunately, it fails for a more realistic example, which I've added in my question (just a sterdon's answer fails in that example)
hr flag
@user56834 that's likely because `[ ... ]` denotes a character range in PCRE. Try replacing `"$string"` with `"\\Q${string}\\E"`
user56834 avatar
cn flag
a bit later reply but: Could you point me to a place where I can read about what \\Q and \\E do?
hr flag
@user56834 try perldoc's [quotemeta](https://perldoc.perl.org/functions/quotemeta)
Score:2
cn flag

Searching for multiline patterns in a file might be easier with awk:

awk '/Start pattern/,/End pattern/' filename

Check this post for further details

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.