Score:2

All of my DNS zones do not make it on the secondary, what is wrong?

cn flag

TL;DR You may want to quickly jump to the answer to know what happened and not spend time reading the whole question.


I have a tool (ipmgr) to generate my zones (I had to manage about 35 of them, so that made it easier). All the zones are generated the same way, but one all misbehaves and the secondary DNS server never gets a copy of the zones. It somehow refuses the copy.

IMPORTANT NOTE: Everything used to work just fine. It just decided to stop working on Marsh 22nd, somehow...

Primary DNS (Master)

Logs

Logs of the loading of the main zone (with the ns1/ns2 IPs) on the primary:

25-Mar-2023 01:27:28.745 zoneload: debug 1: zone m2osw.com/IN: starting load
25-Mar-2023 01:27:28.745 general: debug 1: zone_startload: zone m2osw.com/IN: enter
25-Mar-2023 01:27:28.745 zoneload: debug 1: zone m2osw.com/IN: journal rollforward completed successfully: no journal
25-Mar-2023 01:27:28.745 zoneload: debug 1: zone m2osw.com/IN: loaded; checking validity
25-Mar-2023 01:27:28.745 general: debug 1: dns_zone_verifydb: zone m2osw.com/IN: enter
25-Mar-2023 01:27:28.745 general: debug 1: zone_settimer: zone m2osw.com/IN: enter
25-Mar-2023 01:27:28.745 zoneload: info: zone m2osw.com/IN: loaded serial 248
25-Mar-2023 01:27:28.749 general: debug 1: dns_zone_maintenance: zone m2osw.com/IN: enter
25-Mar-2023 01:27:28.749 general: debug 1: zone_settimer: zone m2osw.com/IN: enter
25-Mar-2023 01:27:28.757 notify: info: zone m2osw.com/IN: sending notifies (serial 248)

The loading of the flaky zone on primary looks the same:
(Note: I've now determine that all zones are "flaky" as in they do not transfer, so it makes sense that all the logs look alike.)

25-Mar-2023 01:27:28.745 zoneload: debug 1: zone best-gamblers.games/IN: starting load
25-Mar-2023 01:27:28.745 general: debug 1: zone_startload: zone best-gamblers.games/IN: enter
25-Mar-2023 01:27:28.745 zoneload: debug 1: zone best-gamblers.games/IN: journal rollforward completed successfully: no journal
25-Mar-2023 01:27:28.745 zoneload: debug 1: zone best-gamblers.games/IN: loaded; checking validity
25-Mar-2023 01:27:28.745 general: debug 1: dns_zone_verifydb: zone best-gamblers.games/IN: enter
25-Mar-2023 01:27:28.745 general: debug 1: zone_settimer: zone best-gamblers.games/IN: enter
25-Mar-2023 01:27:28.745 zoneload: info: zone best-gamblers.games/IN: loaded serial 233
25-Mar-2023 01:27:28.749 general: debug 1: dns_zone_maintenance: zone best-gamblers.games/IN: enter
25-Mar-2023 01:27:28.749 general: debug 1: zone_settimer: zone best-gamblers.games/IN: enter
25-Mar-2023 01:27:28.753 notify: info: zone best-gamblers.games/IN: sending notifies (serial 233)

I don't see any errors in the primary logs for any of my zones.

Zone Files

The main zone has the ns1 & ns2 definitions among other things:

$ORIGIN .
$TTL 3600
m2osw.com IN SOA ns1.m2osw.com. hostmaster.m2osw.com. (248 10800 180 1209600 300)
        NS ns1.m2osw.com.
        NS ns2.m2osw.com.
        MX 10   mail.m2osw.com.
        A       165.232.146.181
$ORIGIN m2osw.com.
mail    A       165.232.146.181
ns1     A       165.232.146.181
ns2     A       96.67.192.225
www     A       165.232.146.181
... more TXT / A records ...

And here is the one that fails:

$ORIGIN .
$TTL 3600
best-gamblers.games IN SOA ns1.m2osw.com. hostmaster.m2osw.com. (233 10800 180 1209600 300)
        NS ns1.m2osw.com.
        NS ns2.m2osw.com.
        A       165.232.146.181
$ORIGIN best-gamblers.games.
www     A       165.232.146.181

The named.conf includes a file which includes:

zone "m2osw.com" {
  type primary;
  file "/var/lib/bind/m2osw.com.zone";
  allow-transfer { trusted-servers; };
  check-names warn;
  max-journal-size 2M;
};

And the failing zone is defined like so:

zone "best-gamblers.games" {
  type primary;
  file "/var/lib/bind/best-gamblers.games.zone";
  allow-transfer { trusted-servers; };
  check-names warn;
  max-journal-size 2M;
};

Secondary DNS (Slave)

Zone Settings

The secondary has one file with zone references that looks like this:

zone "m2osw.com" {
  type secondary;
  primaries { list-of-primaries; };
  allow-transfer { none; };
  file "/var/cache/bind/m2osw.com.zone";
};
zone "best-gamblers.games" {
  type secondary;
  primaries { list-of-primaries; };
  allow-transfer { none; };
  file "/var/cache/bind/best-gamblers.games.zone";
};

The "/var/cache/bind/m2osw.com.zone" file (and about 20 others) get created as expected. All of these work just fine. As I mentioned above, I use a tool to create all the files so it's not like it's going to be any different except for the zone name and corresponding file... the rest is the same for all of them and as we can see above, the two zones I present are exactly the same!

The "/var/cache/bind/best-gamblers.games.zone" does not get created at all. As I test, I tried to remove the '-' in the filename, but that did not help at all.

Logs

Just like the primary logs, I can't find any errors in the secondary logs and they look alike for the primary domain (which works):

24-Mar-2023 18:47:29.074 general: debug 1: zone m2osw.com/IN: starting load
24-Mar-2023 18:47:29.075 general: debug 1: zone m2osw.com/IN: journal rollforward completed successfully: no journal
24-Mar-2023 18:47:29.075 general: debug 1: zone m2osw.com/IN: loaded; checking validity
24-Mar-2023 18:47:29.075 general: debug 1: zone_settimer: zone m2osw.com/IN: enter
24-Mar-2023 18:47:29.075 general: info: zone m2osw.com/IN: loaded serial 241
24-Mar-2023 18:47:29.077 general: debug 1: dns_zone_maintenance: zone m2osw.com/IN: enter
24-Mar-2023 18:47:29.077 general: debug 1: zone_settimer: zone m2osw.com/IN: enter
24-Mar-2023 18:47:29.086 notify: info: zone m2osw.com/IN: sending notifies (serial 241)
24-Mar-2023 18:52:22.075 general: debug 1: zone_timer: zone m2osw.com/IN: enter
24-Mar-2023 18:52:22.075 general: debug 1: zone_maintenance: zone m2osw.com/IN: enter
24-Mar-2023 18:52:22.075 general: debug 1: queue_soa_query: zone m2osw.com/IN: enter
24-Mar-2023 18:52:22.075 general: debug 1: zone_settimer: zone m2osw.com/IN: enter
24-Mar-2023 18:52:22.075 general: debug 1: soa_query: zone m2osw.com/IN: enter
24-Mar-2023 18:52:22.095 general: debug 1: refresh_callback: zone m2osw.com/IN: enter
24-Mar-2023 18:52:22.095 general: info: zone m2osw.com/IN: refresh: non-authoritative answer from master 165.232.146.181#53 (source 0.0.0.0#0)
24-Mar-2023 18:52:22.095 general: debug 1: queue_soa_query: zone m2osw.com/IN: enter
24-Mar-2023 18:52:22.575 general: debug 1: soa_query: zone m2osw.com/IN: enter
24-Mar-2023 18:52:22.575 general: debug 1: cancel_refresh: zone m2osw.com/IN: enter
24-Mar-2023 18:52:22.576 general: debug 1: zone_settimer: zone m2osw.com/IN: enter

and the other domain (which fails):

24-Mar-2023 18:47:29.075 general: debug 1: zone best-gamblers.games/IN: no master file
24-Mar-2023 18:47:29.075 general: debug 1: zone_settimer: zone best-gamblers.games/IN: enter
24-Mar-2023 18:47:29.076 general: debug 1: dns_zone_maintenance: zone best-gamblers.games/IN: enter
24-Mar-2023 18:47:29.076 general: debug 1: zone_settimer: zone best-gamblers.games/IN: enter
24-Mar-2023 18:47:29.083 general: debug 1: zone_timer: zone best-gamblers.games/IN: enter
24-Mar-2023 18:47:29.083 general: debug 1: zone_maintenance: zone best-gamblers.games/IN: enter
24-Mar-2023 18:47:29.083 general: debug 1: queue_soa_query: zone best-gamblers.games/IN: enter
24-Mar-2023 18:47:29.084 general: debug 1: zone_settimer: zone best-gamblers.games/IN: enter
24-Mar-2023 18:47:29.089 general: debug 1: soa_query: zone best-gamblers.games/IN: enter
24-Mar-2023 18:47:29.116 general: debug 1: refresh_callback: zone best-gamblers.games/IN: enter
24-Mar-2023 18:47:29.116 general: info: zone best-gamblers.games/IN: refresh: non-authoritative answer from master 165.232.146.181#53 (source 0.0.0.0#0)
24-Mar-2023 18:47:29.116 general: debug 1: queue_soa_query: zone best-gamblers.games/IN: enter
24-Mar-2023 18:47:29.583 general: debug 1: soa_query: zone best-gamblers.games/IN: enter
24-Mar-2023 18:47:29.583 general: debug 1: cancel_refresh: zone best-gamblers.games/IN: enter
24-Mar-2023 18:47:29.583 general: debug 1: zone_settimer: zone best-gamblers.games/IN: enter

Issue

As mentioned above the best-gamblers.games.zone never gets created in the /var/cache/bind folder.

The cancel_refresh could sound like a good reason for doing so, except that if this is the case, then it should be an error, not a debug message. Also the messages are the same for both zones.

However, it looks like zone m2osw.com was not updated on the secondary DNS since it says it loads version 241 and the primary is at version 248. To prove that, I updated the main zone to include volcan.m2osw.com and sure enough, after the necessary amount of time, I can find the new name on NS1 and NS2 says it doesn't know about it. So NS2 does not update anything. I, of course, restarted my bind9 service many times. That does not help at all.

So I think that the main issue is that my primary DNS does not set the "aa" flag when I query it from my secondary DNS. I tried from another server to which I have access and it does show me the "aa" flag on both, primary & secondary. So I think I'm good on that one.

What else could prevent the second from accepting those entries? Is the "aa" the issue? If so, how do I fix it on the second secondary server?


Example of a dig from the secondary DNS server to the primary, no "aa" flag:

$ dig @ns1.m2osw.com m2osw.com

; <<>> DiG 9.11.3-1ubuntu1.18-Ubuntu <<>> @ns1.m2osw.com m2osw.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 50582
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;m2osw.com.         IN  A

;; ANSWER SECTION:
m2osw.com.      3600    IN  A   165.232.146.181

;; Query time: 24 msec
;; SERVER: 165.232.146.181#53(165.232.146.181)
;; WHEN: Wed Mar 22 17:30:34 PDT 2023
;; MSG SIZE  rcvd: 54

Here is the same dig command from my 3rd server (without BIND installed), the "aa" flag is present:

$ dig @ns1.m2osw.com m2osw.com

; <<>> DiG 9.11.3-1ubuntu1.18-Ubuntu <<>> @ns1.m2osw.com m2osw.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 40081
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 913b39f9d8226a8901000000641e51000a3e9c772af12db9 (good)
;; QUESTION SECTION:
;m2osw.com.         IN  A

;; ANSWER SECTION:
m2osw.com.      3600    IN  A   165.232.146.181

;; Query time: 1 msec
;; SERVER: 165.232.146.181#53(165.232.146.181)
;; WHEN: Sat Mar 25 01:40:16 UTC 2023
;; MSG SIZE  rcvd: 82

Note that since I have valid DNS files in the cache of the secondary, I think that it was working just fine before. I don't recall changing anything to the settings so I really don't see why it would all of a sudden stop working...


Update

As it felt like the update did not occur for other domains, I hid the existing cache after stopping bind:

$ sudo systemctl stop bind9
$ sudo mv /var/cache/bind /var/cache/bind-hidden

Then I created the cache folder again:

$ sudo mkdir /var/cache/bind
$ sudo chgrp bind /var/cache/bind
$ sudo 775 /var/cache/bind

I restarted bind:

$ sudo systemctl start bind9

and looked into the folder:

$ ls /var/cache/bind-empty
managed-keys.bind  managed-keys.bind.jnl

and as we can see it created two files. So there are no read/write permission issues (and yes, the original folder is also root:bind and 775). I could see the transfer signals in the logs (like above) so I'm sure that new zone files should have appeared, but nothing at all.

This clearly proves that the secondary refuses all files from the primary. Probably because it sees the primary as non-authoritative?

dig AXFR ... Output

I ran the following commands from the secondary:

$ dig afxr @ns1.m2osw.com best-gamblers.games

; <<>> DiG 9.11.3-1ubuntu1.18-Ubuntu <<>> afxr @ns1.m2osw.com best-gamblers.games
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60945
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;afxr.              IN  A

;; Query time: 5 msec
;; SERVER: 165.232.146.181#53(165.232.146.181)
;; WHEN: Mon Mar 27 21:05:04 PDT 2023
;; MSG SIZE  rcvd: 33

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 31521
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;best-gamblers.games.       IN  A

;; ANSWER SECTION:
best-gamblers.games.    3600    IN  A   165.232.146.181

;; Query time: 57 msec
;; SERVER: 127.0.0.53#53(127.0.0.53)
;; WHEN: Mon Mar 27 21:05:04 PDT 2023
;; MSG SIZE  rcvd: 64

Not sure whether this is success or not, but it looked exactly the same as from the 3rd party server so my take is that it's a failure even though the domain name appears in the ANSWER SECTION.

I repeated that directly on the primary and it shows an AUTHORITY SECTION:

$ dig axfr @ns1.m2osw.com best-gamblers.games

; <<>> DiG 9.18.12-0ubuntu0.22.04.1-Ubuntu <<>> axfr @ns1.m2osw.com best-gamblers.games
; (1 server found)
;; global options: +cmd
; Transfer failed.
Patrick Mevzek avatar
cn flag
Not clear how the zones get updated (edit zonefile? DNS update?) but remember to change SOA serial otherwise changes won't be seen. Among other things. Network flows as well and TCP/53 between primary and secondaries.
cn flag
@PatrickMevzek My tool (ipmgr) automatically does that for me. However, just in case, I tried manually like 7 or 8 times to change some values and the slave has not received anything since March 22, 2023. Now I'm wondering whether the latest BIND9 update would have something to do with this. But I would imagine many more people having an issue if that was the case...
Patrick Mevzek avatar
cn flag
You have to split the problem in two: does the primary see the zone change and reload the zone (easy to test with `dig`) and then does it notify the secondaries (optional but speeds up convergence). And then, do secondaries get the notification and attempt the transfer or not, and do they poll regularly the primary for updates.
cn flag
@PatrickMevzek Did the logs I put in my question not show that the primary does send notifications to the secondary and that the secondary receives said messages?
Patrick Mevzek avatar
cn flag
Maybe, but sorry a bit too long, hence I am just commenting to give tips, not delving into a full specific answer. Check that what the secondaries think is primary from configuration file as IP addresses matches from where the notify comes. But again, zone transfer can work even without notifies, just slower. I didn't see which bind version you are using, you can always try another one. Does `dig AXFR` from secondary work manually?
cn flag
@PatrickMevzek Okay. I added the output of the AXFR but I'm not too sure whether it worked or not. What should the output be if it worked? I have output for the secondary first and then I tried on the primary. The difference is that there is an AUTHORITY in the primary, but it's from a root, not the primary. No such thing in the secondary.
Patrick Mevzek avatar
cn flag
"What should the output be if it worked?" The full zone content.
cn flag
If the transfer is expected to return all the `A` and `TXT` fields, then it clearly doesn't work. I have a partial sample of both zones under the **Zone Files** section.
djdomi avatar
za flag
how do you basically create a new zone file? if bind is used it's well known documented how a secondary can be pushed
cn flag
Oh. I tried `AFXR` instead of `AXFR` which fails manually as well. So that would explain. I'm on it again since I upgraded to 22.04 and thus eliminated the possibility that 18.04's bind was the issue.
cn flag
@djdomi How important how I created the files?! (I mentioned it, see ipmgr on very first line) I have examples in question (3rd to 6th code samples), isn't that sufficient to show what has been working for ages for me?
cn flag
@djdomi Another important aspect, I looked at the code, and clearly if the answer from the primary is non-authoritative, the results are ignored by the secondary. Somehow, that's what's happening. I do not understand why the results would be marked as non-authoritative, though. I'm wondering whether my ISP could be doing something to the UDP packets (!?). If I query from anywhere except my secondary (which is my server at home with a static IP), the reply is clearly authoritative... (line 14194 of lib/dns/zone.c)
cn flag
@djdomi For reference, I found [this thread](https://forums.businesshelp.comcast.com/conversations/domain-namesstatic-ip/transparent-dns-proxying-started-after-a-modem-swap/5fe0a629c5375f08cd95b75b?page=2) which implies that Comcast may actually be the culprit. Which would make sense since my Linux setup has not changed (however, I got a new Comcast bundle, and maybe that new shit does that to me!) I'll have to contact them.
djdomi avatar
za flag
Im not familiar with the us Comcast due i am from overseas, but as I know Comcast is for private purposes only which would led the complete question into offtopic
cn flag
@djdomi We have Comcast Business in the US. I have a static IP through that and have been using it for years without any problem. But it looks like the last time they upgraded my account, they also messed up port 53...
Score:1
cn flag

The setup was fine.

The issue was from the new firewall at Comcast which is front of my server at home. Comcast admits this in Article 30.4 of their Business Services Customer Terms and Conditions V. 41

Customer’s non-Comcast applications and services that use TCP/UDP port- 53 (i) may not be compatible with the SecurityEdge Service, which may result in such non-Comcast applications and services not functioning properly, and (ii) may affect certain Comcast Services (including Business Internet)

I finally decided to create a small Droplet with DigitalOcean (the smallest is just $4/mo. at the moment) and that worked as soon as I fixed my firewall/setup to allow that new computer as the secondary (i.e. changed the IP address here and there).

That's it.

anx avatar
fr flag
anx
Wow, they even admit this openly. I'll add the authoritative source, then.
cn flag
@anx Oh! That's great. Thank you for checking further into this and full confirmation.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.