Trying to mirror my website with wget, but nofollow attribute is found , and I cannot download anything more than index.html

mx flag

I am running a wordpress site on a Ubuntu 20.04 based LEMP server. I have the pagespeed plugin enabled, and in order to force it to cache my website, I am using wget from a different box to mirror the site. However, when using wget from a 2nd box, It stops downloading at the first page (index.html), with the error

nofollow attribute found in /tmp/ramdisk/ Will not follow any links on this page Below is the wget command I am using and the return results:

wget -m -p -E -k -P /tmp/ramdisk/
--2022-05-17 16:41:40--
Resolving ( 1**.2*.1**.*
Connecting to (|1**.2*.1**.*|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘/tmp/ramdisk/’                                     [   <=>                                                                                                                       ] 130.71K   210KB/s    in 0.6s

Last-modified header missing -- time-stamps turned off.
2022-05-17 16:41:42 (210 KB/s) - ‘/tmp/ramdisk/’ saved [133848]

nofollow attribute found in /tmp/ramdisk/ Will not follow any links on this page
FINISHED --2022-05-17 16:41:42--
Total wall clock time: 2.0s
Downloaded: 1 files, 131K in 0.6s (210 KB/s)
Converting links in /tmp/ramdisk/ 135.
Converted links in 1 files in 0.004 seconds.

How can I go about finding the nofollow attributes and removing them so wget will fully download my website?

in flag

As documented here you can tell wget to ignore the no-follow attribute by adding the parameter -e robots=off

DanRan avatar
mx flag
But a few days ago, i didnt have to implement this in wget. So what did i change on my server that created the no follow links?
in flag
How could we know?
DanRan avatar
mx flag
that exactly what im asking. how coupd we know actually? is there a way tosearch forthose links in my wordpress directory or something?
mx flag

I figured this out.

I had to log into my wordpress installation via the web interface, and go to Settings>Reading>Search engine visibility, then on that page I had to uncheck the

Discourage search engines from indexing this site It is up to search engines to honor this request.

option. After I unchecked that, I could successfully mirror my site using the wget command wget -m -p -E -k -P /tmp/ramdisk/

See the screenshot below for more info. Wordpress - Search Engine Visibility - Discourage Search Engines from Indexing this site


