Score:1

Still having DNS forwarding issue

cn flag

I asked this question a while back and it got bumped to chat because a lot of subjective opinions.

Original message here for reference: https://chat.stackexchange.com/rooms/139176/discussion-on-question-by-sabre-dns-forwarding-issue

And I found a seemingly similar issue, unanswered as well. Conditional Forwarding intermittent failures

So I figured I would try to consolidate it to basic information and try again. With log files for demonstration.

The core question is, without reporting errors, why would a DNS forwarder selectively fail for one host at random and then resume normal operation later? The details of how are as follows...


Edit: I can add to this, the issue happened again this AM (Day after post). The logs show when the incident occurred, one query happened correctly, then less than a second later, asked its WAN forwarder vs its cache or LAN. That cached the external IP, and failed everyone from that point forward until we deleted cache. First query after that followed the forwarder and cached the correct IP. Further making this mysterious, if cached, it should have not asked for a new IP anyway?


I have two DNS servers, two domains, both on LAN, both DC and DNS for their respective AD domains.

One domain is .local so cannot be queried from public DNS, the other is .ORG. The .ORG is split between hosts both on LAN and hosts on internet. We are only concerned in this scenario with intercepting hosts on LAN and let public DNS deal with the rest. So LAN hosts are handled by local server, anything else (Not a LAN host) goes out the next forwarder which is openDNS (And ultimately our SOA is Godaddy). I have learned this is what is referred to as SplitBrains DNS, and apparently a normal thing for hybrid DNS scenario just like mine.

So if you ask the DNS server for A.local where is one of the hosts on B.org, it should and almost always does ask B.org where that is (And never leave LAN unless it does not find a matching hostname there.)

I included a picture of the host that is failing to forward so we do not go down the "there is no such things as DNS forwarding" path again.

DNS config

What is happening, is that randomly a host on the .org domain does not resolve, because .local DNS server does not ask the DNS server at the .org domain, meaning it never tries the conditional forwarder, I have now confirmed this with a simultaneous packet capture on both hosts, the path goes A.local=>openDNS not A.local=Forward=>B.org.

When it fails the .local does not even try to send to the .org, and the .org confirmed never receives any request.

If you query the .org directly not through the forwarder (NSLOOKUP), it works fine, host is there, and I can see its DNS record. As well the forwarder works fine during this time for other hosts on the .org domain. And the particular host that has these failures is not consistently the same.

This happens off and on, very infrequent, and random, with no change in configuration, and resumes normal operation later, again with no change in configuration.

Log files attached (From DNS logging on .local DNS server where failure is occurring), of the correct chain and the incorrect. The IP 10.1.1.250 is the DNS server for the .org, 10.1.0.16 is the IP of the client requesting the host resolution.

LogCorrect

  • Request from client
  • Request to .org DNS server (Forward/10.1.1.250)
  • Response from .org DNS server to .local DNS server
  • Response to client with information obtained via forward.

LogFailed

  • Request from client
  • Response from .local DNS server, directing it to external DNS (Forwarder never asked)

Hopefully those details will keep it in the question realm, not a chat :-)

Thank you.

LogCorrect:

10/12/2022 9:48:12 AM 0758 PACKET  000000883A640200 UDP Rcv 10.1.0.16       001e   Q [0001   D   NOERROR] A      (3)myhost(4)mydomain(3)org(0)
UDP question info at 000000883A640200
  Socket = 592
  Remote addr 10.1.0.16, port 57756
  Time Query=2147027, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x001e (30)
  Message:
    XID       0x001e
    Flags     0x0100
      QR        0 (QUESTION)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        1
      RA        0
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   0
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(3)myhost(4)mydomain(3)org(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
      empty

10/12/2022 9:48:12 AM 0758 PACKET  000000883A4581A0 UDP Snd 10.1.1.250      d94e   Q [0001   D   NOERROR] A      (3)myhost(4)mydomain(3)org(0)
UDP question info at 000000883A4581A0
  Socket = 10476
  Remote addr 10.1.1.250, port 53
  Time Query=0, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x0029 (41)
  Message:
    XID       0xd94e
    Flags     0x0100
      QR        0 (QUESTION)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        1
      RA        0
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   1
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(3)myhost(4)mydomain(3)org(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
    Offset = 0x001e, RR count = 0
    Name      "(0)"
      TYPE   OPT  (41)
      CLASS  4000
      TTL    32768
      DLEN   0
      DATA   
        Buffer Size  = 4000
        Rcode Ext    = 0
        Rcode Full   = 0
        Version      = 0
        Flags        = 80 DO

10/12/2022 9:48:12 AM 0758 PACKET  000000883E98E210 UDP Rcv 10.1.1.250      d94e R Q [8085 A DR  NOERROR] A      (3)myhost(4)mydomain(3)org(0)
UDP response info at 000000883E98E210
  Socket = 10476
  Remote addr 10.1.1.250, port 53
  Time Query=2147027, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x0039 (57)
  Message:
    XID       0xd94e
    Flags     0x8580
      QR        1 (RESPONSE)
      OPCODE    0 (QUERY)
      AA        1
      TC        0
      RD        1
      RA        1
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    1
    NSCOUNT   0
    ARCOUNT   1
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(3)myhost(4)mydomain(3)org(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
    Offset = 0x001e, RR count = 0
    Name      "[C00C](3)myhost(4)mydomain(3)org(0)"
      TYPE   A  (1)
      CLASS  1
      TTL    1200
      DLEN   4
      DATA   10.1.1.218
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
    Offset = 0x002e, RR count = 0
    Name      "(0)"
      TYPE   OPT  (41)
      CLASS  4000
      TTL    32768
      DLEN   0
      DATA   
        Buffer Size  = 4000
        Rcode Ext    = 0
        Rcode Full   = 0
        Version      = 0
        Flags        = 80 DO

10/12/2022 9:48:12 AM 0758 PACKET  000000883A640200 UDP Snd 10.1.0.16       001e R Q [8081   DR  NOERROR] A      (3)myhost(4)mydomain(3)org(0)
UDP response info at 000000883A640200
  Socket = 592
  Remote addr 10.1.0.16, port 57756
  Time Query=2147027, Queued=2147027, Expire=2147032
  Buf length = 0x0200 (512)
  Msg length = 0x002e (46)
  Message:
    XID       0x001e
    Flags     0x8180
      QR        1 (RESPONSE)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        1
      RA        1
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    1
    NSCOUNT   0
    ARCOUNT   0
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(3)myhost(4)mydomain(3)org(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
    Offset = 0x001e, RR count = 0
    Name      "[C00C](3)myhost(4)mydomain(3)org(0)"
      TYPE   A  (1)
      CLASS  1
      TTL    1199
      DLEN   4
      DATA   10.1.1.218
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
      empty

LogFailed

10/12/2022 9:39:38 AM 0748 PACKET  000000883EE821D0 UDP Rcv 10.1.0.16       3858   Q [0001   D   NOERROR] A      (3)myhost(4)mydomain(3)ORG(0)
UDP question info at 000000883EE821D0
  Socket = 592
  Remote addr 10.1.0.16, port 62365
  Time Query=2146514, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x001e (30)
  Message:
    XID       0x3858
    Flags     0x0100
      QR        0 (QUESTION)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        1
      RA        0
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   0
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(3)myhost(4)mydomain(3)ORG(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
      empty

10/12/2022 9:39:38 AM 0748 PACKET  000000883EE821D0 UDP Snd 10.1.0.16       3858 R Q [8081   DR  NOERROR] A      (3)myhost(4)mydomain(3)ORG(0)
UDP response info at 000000883EE821D0
  Socket = 592
  Remote addr 10.1.0.16, port 62365
  Time Query=2146514, Queued=0, Expire=0
  Buf length = 0x0200 (512)
  Msg length = 0x0067 (103)
  Message:
    XID       0x3858
    Flags     0x8180
      QR        1 (RESPONSE)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        1
      RA        1
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   1
    ARCOUNT   0
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(3)myhost(4)mydomain(3)ORG(0)"
      QTYPE   A (1)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
    Offset = 0x001e, RR count = 0
    Name      "[C010](4)mydomain(3)ORG(0)"
      TYPE   SOA  (6)
      CLASS  1
      TTL    466
      DLEN   61
      DATA   
        PrimaryServer: (6)pdns07(13)domaincontrol(3)com(0)
        Administrator: (3)dns(5)jomax(3)net(0)
        SerialNo     = 2022083000
        Refresh      = 28800
        Retry        = 7200
        Expire       = 604800
        MinimumTTL   = 600
    ADDITIONAL SECTION:
      empty
Patrick Mevzek avatar
cn flag
"so we do not go down the "there is no such things as DNS forwarding"" There is still no DNS "forwarding". There are DNS servers that can be configured to forward queries for some zones to other nameservers. It is not a feature of the protocol (DNS), but of some servers. It is important to understand the distinction.
Sabre avatar
cn flag
I understand that, but there IS DNS forwarding. in this context, clearly it is named and presented so. It is analogous to saying there is no PDF in your email because the SMTP protocol does not mention PDF, only binary data. There is an argument from the protocol level, but since that is specifically not the context of the question, the context is the implementation of DNS in a Microsoft DNS server... it is an argument on semantics. I get what you are saying, I just do not understand why and what it contributes.
cn flag
Do the local DNS servers use forwarders, and have they been configured to not use root servers, and the %systemroot%\system32\dns\cache.dns file has been deleted?
Sabre avatar
cn flag
Both use forwarders, and are configured not to use root hints if unavailable. They forward through OpenDNS. So DNS queries for any host on LAN goes to one of them depending on which domain you are on. IF you are on one and seeking host on the other domain, it goes through the conditional forwarder to the other domains DC/DNS, all other traffic goes through the open DNS forwarders. That file has not been removed, but it should be being ignored due to setting unless I misunderstand. They are just DC DNS servers serving internet needs as well.
gapsf avatar
ng flag
How ip assigned to all interfaces on A and B? Static or dhcp?
gapsf avatar
ng flag
Try to disable ipv6 in registry and interfaces on both servers. https://learn.microsoft.com/en-US/troubleshoot/windows-server/networking/configure-ipv6-in-windows
Sabre avatar
cn flag
Static on both DNS servers, they are AD controllers for their respective domains. The disable IPv6 sounds promising, we do have it disabled (unchecked) at the adapter properties level. Would this have additional effect other than making it system wide for an new new adapter? There is only one adapter, and only will ever be one. The only IPv6 address in ipconfig is the isatap.
Score:1
in flag

This appears to be a cached non-authoritative referral response for a record request that does not exist (did not exist at the time it was cached). That can be a normal response.

You would need to hunt through earlier logs to see what occurred during the previous outbound request from the conditional forwarder for myhost.mydomain.org. I suspect a working cached entry has become stale, been deleted from the cache, a request comes in, the conditional forwarder attempts to forward it, the authoritative server does not respond, the forwarder does a regular lookup through forwarder or root hints instead, and caches that.

It's cached, because you don't see any outgoing packet from the conditional forwarder. It's non-authoritative because the DNS server is operating as a conditional forwarder, so it does not have an authoritative zone file. It's a referral, as the response has no answer record only the authority records.

RFC 1034 describes this response as an example in which the hostname has been mistyped, however this is not the only scenario in which this can happen. E.g. if the record is deleted.

RFC 1034 DOMAIN NAMES - CONCEPTS AND FACILITIES

6.2.5. QNAME=SIR-NIC.ARPA, QTYPE=A

If a user mistyped a host name, we might see this type of query.

C.ISI.EDU would answer it with:

Header OPCODE=SQUERY, RESPONSE, AA, RCODE=NE
Question QNAME=SIR-NIC.ARPA., QCLASS=IN, QTYPE=A
Answer <empty>
Authority . SOA SRI-NIC.ARPA. HOSTMASTER.SRI-NIC.ARPA. 870611 1800 300 604800 86400
Additional <empty>

This response states that the name does not exist. This condition is signalled in the response code (RCODE) section of the header.

The SOA RR in the authority section is the optional negative caching information which allows the resolver using this response to assume that the name will not exist for the SOA MINIMUM (86400) seconds.

The referral response expects that the client will follow up with the authoritative server to get an authoritative answer. In this case, the result appears they are directed to the external rather than internal records.

In a general sense, if the record has not been mistyped, deleted, or otherwise affected then your configuration is correct and should have worked. This could be a defect, however I am not inclined to believe that is the case. Take a very close look at other traffic and logs, and I suspect you will find the DNS is working as it is supposed to even if it is not as you expect it to.

That said, while conditional forwarders are a valid solution they suffer some weaknesses. In particular it requires the two DNS servers to communicate in the moment that the (uncached) query is required. If the server is down, LAN is down, or other communications failures occur the DNS query fails. Note that DNS uses UDP first which is not a reliable protocol (it's "reliable" but does not guarantee data will reach its destination). The conditional forwarder is non-authoritative for the zone, so it cannot directly answer with a negative response. The conditional forwarder is limited to servers entered in the forwarding configuration, which may be less than what exists in the zone NS records.

As an alternative, are your domains joined as a single forest? From your naming it appears not, however if so consider using Active Directory Integrated DNS zones so that all DNS servers in the forest are authoritative and contain complete replicated zones for all the domains via Active Directory. This can have side effects where a conditional forwarder is more "real-time" as it talks to the authoritative server, while an Active Directory replica will be slightly behind due to AD replication delays.

If you do not have a single forest, consider using older style Zone Transfer replicas. Like above with AD, it promotes the DNS server to be authoritative and gives it a copy of the zone file. This can be more resilient, as the forwarder is no longer dependent on communicating with the authoritative server(s) in the moment the query is required.

Sabre avatar
cn flag
Thank you for the comprehensive answer, the situation is temporary, and yes they are independent of one another. Since our count of hosts to need the forwarding is low we reserved some IP space, set them static, and put the forward lookup zone in the DNS servers direct. But I still wanted to know why we had to do that. Where you said "Take a very close look at other traffic and logs" That is what drove me to ask this question to begin with. I have, exhaustively. At the point in time the server requests from its public forwarder a packet capture on both DNS servers shows no attempt...
Sabre avatar
cn flag
...Other DNS queries through the forwarder, for other hosts on the same domain *and* for the victim, occur without issue within 1-2ms before (After the incorrect lookup is cached), as do other queries. The hosts are in continuous use, and no network issues of any kind are observed, logged, or can be detected, even packet level.. They are two VMs in the same Esxi instance, same virtual switch. Possible this occurs in some race condition *between* the expired cache and the next correct query internally in the DNS server?
Doug avatar
in flag
Best I can say at this point would be "maybe". The conditional forwarder should work as you have described it, yet what you are seeing tells us "something" happened. It would be a deep dive to find what, as you have already seen.
Sabre avatar
cn flag
Good deal, and fair response, since you ave intelligently engaged the question, and had expanded suggestions, I will take that. At the very least it confirms I am not crazy, this *should* work as configured. As configured it does not represent some abject failure in comprehending "The way it should be" , and the behavior is indeed a transient anomalous vs definable administration. I did not expect resolution on something this fleeting and specific, as much as peer confirmation. Thank you very much for your input.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.