Score:0

wget does not recurse when piping the output to stdout

jp flag

I want to download webpages recursively and pipe the output to a filter. I am using:

wget -qm -O- http://mywebsite.com/initialpath.php | ./filter

But wget stops downloading after the first page and waits for input instead of parsing the webpage and downloading linked files. It works if I save the output to a file with -O filename but I want to handle the webpages on the fly with a filter.

How can I achieve this?

us flag
Are you sure the `./filter` does not block here?
chqrlie avatar
jp flag
I am sure... I studied the source code for `wget` and found the explanation.
Score:1
jp flag

I does not seem possible to achieve my goal with current versions of wget.

After studying the source code for wget version 1.18, I came to these conclusions:

  • wget cannot recurse if it does not store the downloaded files, at least temporarily as for --spider.

  • When passed -O filename, it keeps appending to filename and reparses the whole file after each download, loading it completely in memory (or mapping it). This is very cumbersome and inefficient.

  • When passed -O-, it pipes the downloaded file to stdout and attempts to reload - to look for more urls to fetch... Which causes stdin to be read for this purpose. This is a side effect of the implementation.

I wrote a patch to add a more sensible piping option, relying on --spider to download html and css files for recursive operation and piping only these files before they are removed. I will publish the patch when it is reasonably tested and documented.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.