So I have an html file that contains the following somewhere in the middle:
<span dir="ltr">http:(...).com</span>
I'm attempting to extract the url, but am having some issues doing so. Because that "ltr" is the only one that exists in the html, I came up with this regex:
(?<=ltr">)(.*)(?=<\/span>)
Using regex101 I confirmed that the regex expression works. However, because of how ansible deals with quotes and double quotes, I think it may be causing some issues.
I'm trying it like this:
- set_fact:
regex_test: " {{ htmlres.content | regex_search('(?<=ltr">)(.*)(?=<\/span>)') }}"
Where htmlres.content is the html content received from an http get request done previously in the same playbook.
However, running it:
- set_fact:
regex_pubdest: " {{ htmlres.content | regex_search('(?<=ltr">)(.*)(?=<\/span>)' }}"
^ here
Is there any way to circumvent this issue with quotes in regex in ansible? I've managed to achieve the desired output by doing something slightly different, which is this:
shell: grep -oP 'ltr">\K.*?(?=</span>)' /dir/htmlcontent.txt
The issue is the previous only works when reading from a file, and I'm trying to avoid saving the html.content to a file before passing a regex through it. I've tried replacing the path to the folder in the grep with "{{html.content}}", but unfortunately that causes ansible to not run correctly due to the quotes.
Any ideas?
Thank you!