You should generally avoid using generic text parsing tools for structured data. Since you have a json file, it is safer and simpler to use a dedicated json parser. In your case, you would want to extract the value of the first element of the array args
which is itself the first element of the top level array args
, the child of the top level hash $quer
:
$ jq '."$quer"."args"[0]["args"]' file.json
[
"select\n db1.table1 as tab1,\n db1.table2 as tab2,\n db1.table3 as tab3\n from db1.table4 as tab4"
]
From here, you no longer have structured data and you need to resort to cruder methods. I don't know how you want to identify your target string, you didn't explain that. So, depending on what you actually want, you could do:
Skip lines starting with [
or ]
and then print the second word of the remaining lines:
$ jq '."$quer"."args"[0]["args"]' file.json | awk '/^[^][]/{print $2}'
db1.table1
Print the second word of the second line
$ jq '."$quer"."args"[0]["args"]' file.json | awk 'NR==2{print $2}'
db1.table1
Print the longest stretch of non-whitespace after the string "select\n
:
$ jq '."$quer"."args"[0]["args"]' file.json | grep -oP '"select\\n\s*\K\S*'
db1.table1
If you explain exactly how we are supposed to know what string to extract, I could give you a more targeted answer.
For the sake of completion, in your specific example, and I stress that this will not be portable and is almost certain to fail if your input data change in any way, you can use simple text tools directly:
$ grep -oP '"select\\n\s*\K\S*' file.json
db1.table1
$ awk '$1=="\"select\\n"{print $2}' file.json
db1.table1
$ sed -nE 's/.*"select\\n\s*(\S+).*/\1/p' file.json
db1.table1