Score:0

How to debug Squid ERR_DNS_FAIL

bd flag

I am managing a couple of web proxies running Squid 4.10 on Ubuntu 20.04LTS in several locations distributed worldwide. One of them has developed a nasty habit of occasionally failing to access a web page. The user receives instead an error page saying:

Hmmm... can't reach this page
It looks like the webpage at <URL> might be having issues,
or it may have moved permanently to a new web address.
ERR_TUNNEL_CONNECTION_FAILED

After adding %err_code/%err_detail to the end of the relevant logformat as recommended on this mailing list post, Squid access.log entries for the failing accesses look like this:

1635169354.239    171 10.72.1.103 NONE/503 0 CONNECT ad.360yield.com:443 - HIER_
NONE/- - ERR_DNS_FAIL/-

Squid status is NONE/503, and the error code and detail always ERR_DNS_FAIL/-. The timestamp, client IP address and requested URL vary of course.

Each occurrence of the problem affects a single FQDN or very small number of FQDNs, often all from the same organisation (eg. lm.licenses.adobe.com and cc-api-data.adobe.io, both from Adobe.) All other accesses continue to work normally. An occurrence lasts typically between five and ten minutes. During that time all clients trying to access that FQDN are affected. Before and after that, the same FQDN works without a problem. There is no discernible regularity in the affected FQDNs.

Some of the occurrences are accompanied by a message like:

2021/10/25 15:42:34 kid1| ipcacheParse No Address records in response to 'ad.360yield.com'

in /var/log/squid/cache.log but in the majority of cases nothing is logged there.

How can I find out what goes wrong there?

Score:0
bd flag

Increasing the loglevel for DNS lookups to 6 by putting the directive

debug_options ALL,1 78,6

into /etc/squid/squid.conf makes Squid log to /var/log/squid/cache.log which nameserver was used for the failed queries, for example:

2021/10/26 16:16:43.088 kid1| 78,3| dns_internal.cc(1369) idnsRead: idnsRead: FD 17: received 32 bytes from 127.0.0.1:53
2021/10/26 16:16:43.088 kid1| 78,3| dns_internal.cc(1176) idnsGrokReply: idnsGrokReply: QID 0x376f, 0 answers

The failures can then be further investigated on that nameserver.

In my case, this pointed to a dnsmasq DNS proxy server running on the same machine. Enabling query logging on dnsmasq revealed that one of the four configured external nameservers was responsible for the failures. Queries that got sent to that nameserver failed, while queries sent to one the other three succeeded. So the solution was to drop the faulty external nameserver from the configuration.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.