Score:-1

Getting SERVFAIL / NOTAUTH on Zone Transfer - ISC BIND 9

ma flag

I have two BIND servers running BIND 9:

BIND 9.11.36-RedHat-9.11.36-3.el8 (Extended Support Version) <id:68dbd5b>
running on Linux x86_64 4.18.0-372.9.1.el8.x86_64 #1 SMP Tue May 10 08:57:35 EDT 2022
built by make with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--program-prefix=' '--disable-dependency-tracking' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-python=/usr/libexec/platform-python' '--with-libtool' '--localstatedir=/var' '--enable-threads' '--enable-ipv6' '--enable-filter-aaaa' '--with-pic' '--disable-static' '--includedir=/usr/include/bind9' '--with-tuning=large' '--with-libidn2' '--enable-openssl-hash' '--with-geoip2' '--enable-native-pkcs11' '--with-pkcs11=/usr/lib64/pkcs11/libsofthsm2.so' '--with-dlopen=yes' '--with-dlz-ldap=yes' '--with-dlz-postgres=yes' '--with-dlz-mysql=yes' '--with-dlz-filesystem=yes' '--with-dlz-bdb=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-lmdb=no' '--with-libjson' '--enable-dnstap' '--with-cmocka' '--enable-fixed-rrset' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--enable-full-report' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fexceptions -fstack-protector-strong -grecord-gcc-switches -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1 -m64 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection' 'LDFLAGS=-Wl,-z,relro -Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld' 'CPPFLAGS= -DDIG_SIGCHASE' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
compiled by GCC 8.5.0 20210514 (Red Hat 8.5.0-10)
compiled with OpenSSL version: OpenSSL 1.1.1k  FIPS 25 Mar 2021
linked to OpenSSL version: OpenSSL 1.1.1k  FIPS 25 Mar 2021
compiled with libxml2 version: 2.9.7
linked to libxml2 version: 20907
compiled with libjson-c version: 0.13.1
linked to libjson-c version: 0.13.1
compiled with zlib version: 1.2.11
linked to zlib version: 1.2.11
linked to maxminddb version: 1.2.0
compiled with protobuf-c version: 1.3.0
linked to protobuf-c version: 1.3.0
threads support is enabled

default paths:
  named configuration:  /etc/named.conf
  rndc configuration:   /etc/rndc.conf
  DNSSEC root key:      /etc/bind.keys
  nsupdate session key: /var/run/named/session.key
  named PID file:       /var/run/named/named.pid
  named lock file:      /var/run/named/named.lock
  geoip-directory:      /usr/share/GeoIP

The master server is at 172.16.19.243 and the secondary at 172.16.19.251. They can ping each other and port 53 (UDP and TCP) is open on both. Both used to work, but some new code was pushed in our automation and both lost network access for around two hours. It is possible the configuration was changed.

The secondary shows no zone files in /etc/named/. Zone transfers fail:

DNS-Secondary named[546308]: general: info: zone 19.16.172.in-addr.arpa/IN: refresh: unexpected rcode (SERVFAIL) from master 172.16.19.251#53 (source 0.0.0.0#0)

/var/log/named/zone_transfers on the primary show:

xfer-out: info: client @0x7f48600ebf90 69.61.12.108#47302 (ns4.mydomain.example): bad zone transfer request: 'ns4.mydomain.example/IN': non-authoritative zone (NOTAUTH)
... 3 days later outage occurs, but no logs appear ...
... a few hours after the outage and repeating to present day ...
notify: info: zone mydomain.example/IN: sending notifies (serial 2022051909)
notify: info: zone 19.16.172.in-addr.arpa/IN: sending notifies (serial 2022051909)
notify: info: zone 16.16.172.in-addr.arpa/IN: sending notifies (serial 2022051909)
notify: info: zone 17.16.172.in-addr.arpa/IN: sending notifies (serial 2022051909)
notify: info: zone 18.16.172.in-addr.arpa/IN: sending notifies (serial 2022051909)

The problem is not resolved by running rndc retransfer mydomain.example. Requesting AXFR with dig also fails:

dig -t axfr mydomain.example 172.16.19.243

; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> -t axfr mydomain.example 172.16.19.243
;; global options: +cmd
; Transfer failed.
; Transfer failed.

Querying A records and PTRs from the internet to master works. Doing the same to the secondary now fails:

dig @172.16.19.251 191.19.16.172.in-addr.arpa ptr

; <<>> DiG 9.18.2 <<>> @172.16.19.251 191.19.16.172.in-addr.arpa ptr
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 57626
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: 204e72e23787aef415f9ec7562866219e93a158c23f1f323 (good)
;; QUESTION SECTION:
;191.19.16.172.in-addr.arpa.    IN      PTR

;; Query time: 48 msec
;; SERVER: 172.16.19.251#53(172.16.19.251) (UDP)
;; WHEN: Thu May 19 10:28:39 CDT 2022
;; MSG SIZE  rcvd: 83

The /etc/named.conf of the master is shown below:

options {
        allow-query {
          none;
        };
        allow-transfer {
          none;
        };
        recursion no;

        auth-nxdomain no;    # conform to RFC1035
        minimal-responses yes;
        minimal-any yes;
        dnssec-enable yes;
        dnssec-validation yes;
};

zone "." IN {
        type hint;
        file "named.ca";
};

include "/etc/named.rfc1912.zones";
include "/etc/named.root.key";

//System Zones
zone "mydomain.example" IN {
  type master;
  file "/etc/named/mydomain.example.db";
  allow-query {any;
  };
  allow-transfer {
    localhost;
    172.16.19.243;
  };
  notify yes;
};

zone "16.16.172.in-addr.arpa" IN {
  type master;
  file "/etc/named/16.16.172.in-addr.arpa.rev";
  allow-query {any;
  };
  allow-transfer {
    localhost;
    172.16.19.243;
  };
  notify yes;
};
// Zones for 17 - 19 are included in the config with the *exact* same format. Programmatically generated - if there's
// a typo here, then there is in all. No zone transfers work.

/etc/named/16.16.172.in-addr.arpa.rev on the master is as follows:

$TTL 86400

@ IN SOA ns3.mydomain.example. admin.mydomain.example. (
                                                2022051917 ;Serial
                                                3600 ;Refresh
                                                1800 ;Retry
                                                604800 ;Expire
                                                86400 ;Minimum TTL
)

;; All Zone NS Records
@ IN NS ns3.mydomain.example.
@ IN NS ns4.mydomain.example.

;; All Zone PTR Records

* IN PTR HDN-UIDO

Again, no DNS lookups for any record works on the secondary, but all work on master. No zones transfer from the master to the secondary. All zones and configurations are generated programmatically, so if there is an error in one zone, it will be present for all. No other errors of note have been found in the logs. No SELinux denials on either server. Permissions of /etc/named/ are 0770 root:named system_u:object_r:named_conf_t:s0 on both servers. Removing all .jnl files did not help (there was only one on the master, and not in /etc/named).

What could be the cause? Thank you.

EDIT 5/19
I confirmed that both servers have 53/UDP and 53/TCP open to each other.

From the secondary:

dig @172.16.19.243 +tcp 200.18.16.172.in-addr.arpa ptr

; <<>> DiG 9.11.36-RedHat-9.11.36-3.el8 <<>> @172.16.19.243 +tcp 200.18.16.172.in-addr.arpa ptr
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6246
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: a9c777930fc7d210a2b794de6286b88d1d5f01b2a729a2f5 (good)
;; QUESTION SECTION:
;200.18.16.172.in-addr.arpa.    IN      PTR

;; ANSWER SECTION:
200.18.16.172.in-addr.arpa. 86400 IN    PTR     EHB-DYN.18.16.172.in-addr.arpa.

;; Query time: 0 msec
;; SERVER: 172.16.19.243#53(172.16.19.243)
;; WHEN: Thu May 19 16:37:15 CDT 2022
;; MSG SIZE  rcvd: 110

named-checkzone was used to check all zones. The only issue it reported was missing a record for NS4, which was also pointed out in the comments. I made note of this and changed it on the servers, but the change didn't make it into this question when I wrote it. In any case, it did not resolve the issue. named-checkconf has been run on both servers and both returned status 0.

The slave server has multiple addresses, but it uses the correct (shown) address to query the master, as confirmed with a packet capture on the master.

The A record has been removed from the reverse zone configuration file. The file snippet above is accurate.

Patrick Mevzek avatar
cn flag
Did you remember to open TCP/53 between the two, as AXFR uses TCP not UDP. Also `ping` is not an adequate tool to troubleshoot DNS issues. `dig` is. And you can use its `+tcp` option to force TCP to see if regular queries do work or not. Other than that `NOTAUTH` appears for queries on zones the server is not authoritative for, and `SERVFAIL` should trigger details in logfiles. PS: please don't obfuscate badly. Use `example.com` or the `.example` TLD if you need placeholder (but questions with real data are always better and elicit better answers).
Patrick Mevzek avatar
cn flag
"Removing all .jnl files did not help " Of course don't do things like that randomly under a running bind instance. Look at `rndc` and its `freeze` and `thaw` commands. Do use also `named-checkconf` and `named-checkzone` to assess validity of your files. There is at least one duplicate line in your reverse zone (`NS ns3` appears twice), and you can't have `ns3 A` in the reverse zone it makes no sense (as defined, it means the name is `ns3.16.16.172.in-addr.arpa` which is certainly not what you intent).
Nikita Kipriyanov avatar
za flag
Actually, a proper way to remove a .jnl file is to run `rndc sync -clean <zone-name>`.
Nikita Kipriyanov avatar
za flag
So, where's the config of the slave? Also, probably the slave has several IP addresses and it uses a wrong address to make outgoing connections to master?
Nikita Kipriyanov avatar
za flag
Also your zone looks strange. If NS servers names all belong to this zone, all of them should have A records in this zone, not only ns3, but ns4 too. Also, out of curiosity, what do you want to achieve with wildcard PTR record in the "forward" zone?
vpseg avatar
ma flag
Thank you @PatrickMevzek and Nikita Kipriyanov, I did my best to follow what you both said (despite your advice about the A records seeming to conflict) and updated the question.
Nikita Kipriyanov avatar
za flag
I misread the question, I did not notice you are configuring a reverse zone. However, you configured in `named.conf` to refer to `/etc/named/16.16.172.in-addr.arpa.rev` file, but showing a content of `/etc/named/16.16.172.in-addr.arpa` (without the `.rev` suffix). It that a mistake in the question or in the configuration? Also, you are showing problems for `19.16.172.in-addr.arpa` zone, but showing configurations and contents of `16.16.172.in-addr.arpa`. I understand zones "are similar", but could you nevertheless make things self-contained? You could spot a mistake when you'll be doing that.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.