Score:0

wget a link with google redirect

cn flag

Got a page full of downloads but all using https://www.google.com/url?q=http://www.$$$/*.pdf&....

I can download using http://www.$$$/*.pdf direct but there are 50+ files. Anyway to avoid this? Can wget do it? I tried but it only download the links as it is under www.google.com/

Any help would be appreciated.

Score:0
in flag

You could use grep -P to filter out the real links and either add as input-file (-i) to wget using process substitution:

wget -i <(grep -Po '[?&]q=\K[^&]*' links)

However, I guess that the embedded URL is URL encoded, then you need a step to unquote:

urldecode() { [ $# -eq 0 ] && str=$(</dev/stdin) || str="$@";  : "${str//+/ }"; echo -e "${_//%/\\x}"; }
wget -i <(grep -Po '[?&]q=\K[^&]*' links | urldecode)

or with python's urllib.parse.unquote:

wget -i <(python -c '
import re
from urllib.parse import unquote
with open("links") as f:
  for line in f.readlines():
    url=re.search("([&?]q=)([^&]*)", line)
    print(unquote(url.group(2)))
')

(of course you could use python to replace the wget part also ...)

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.