Score:0

wget failed: Connection timed out

mt flag
bal

I have the following command to copy the website,

as it tried to hit sun.com it got connection timed out.

I would like the wget to exclude the sun.com so that wget would proceed to the next thing.

Exisitng Issue

$ wget --recursive --page-requisites --adjust-extension --span-hosts --convert-links --restrict-file-names=windows http://pt.jikos.cz/garfield/
.
.
2021-08-09 03:28:28 (19.1 MB/s) - ‘packages.debian.org/robots.txt’ saved [24/24]

2021-08-09 03:28:30 (19.1 MB/s) - ‘packages.debian.org/robots.txt’ saved [24/24]
.


Location: https : //packages. debian. org /robots.txt [following]
--2021-08-09 03:28:33--  https : //packages. debian. org /robots.txt
Connecting to packages.debian.org (packages.debian.org)|128.0.10.50|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 24 [text/plain]
Saving to: ‘packages.debian.org/robots.txt’

packages.debian.org 100%[===================>]      24  --.-KB/s    in 0s

2021-08-09 03:28:34 (19.1 MB/s) - ‘packages.debian.org/robots.txt’ saved [24/24]

Loading robots.txt; please ignore errors.
--2021-08-09 03:28:34--  http ://wwws. sun. com/ robots.txt
Resolving wwws.sun.com (wwws.sun.com)... 137.254.16.75
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.

--2021-08-09 03:28:56--  (try: 2)  http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.

--2021-08-09 03:29:19--  (try: 3)  http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.

--2021-08-09 03:29:43--  (try: 4)  http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.

--2021-08-09 03:30:08--  (try: 5)  http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.

--2021-08-09 03:30:34--  (try: 6)  http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.

--2021-08-09 03:31:01--  (try: 7)  http ://wwws. sun. com/ robots.txt
Connecting to wwws.sun.com (wwws.sun.com)|137.254.16.75|:80... failed: Connection timed out.
Retrying.

Expected $wget to save the whole website without timeouts, if there are timeouts then wget would skip the timeout connections.

Score:2
ar flag
bob

Please read the fine manual about the "risks" of using the --span-hosts (-H) option and how to limit those by adding restrictions:
https://www.gnu.org/software/wget/manual/wget.html#Spanning-Hosts

The --span-hosts or -H option turns on host spanning, thus allowing Wget’s recursive run to visit any host referenced by a link. Unless sufficient recursion-limiting criteria are applied, these foreign hosts will typically link to yet more hosts, and so on until Wget ends up sucking up much more data than you have intended.

...

Limit spanning to certain domains -D
The -D option allows you to specify the domains that will be followed, thus limiting the recursion only to the hosts that belong to these domains.

...

Keep download off certain domains --exclude-domains
If there are domains you want to exclude specifically, you can do it with --exclude-domains, which accepts the same type of arguments of -D, but will exclude all the listed domains.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.