I have a rather simple setup where I have two computers:
Computer A. has a normal NTP setup and uses the standard Internet sources (As per Ubuntu) to determine time. It also allows for query on IP 10.0.2.0/24
:
restrict 10.0.2.0 mask 255.255.255.0 nomodify notrap
Computer B. has a normal NTP setup, except for all the sources are changed to use 10.0.2.1
(which is Computer A).
Once in a while, Computer A gets a Kiss-of-Death signal from one of its source. As a result, it totally kills Computer B's NTP (i.e. it looks like the KoD is transmitted directly).
Is there a way to know the state of an NTP server in terms of whether it will just be sending KoD message or not? (also, how do I get out of that situation? When I looked at it, all the IP addresses shown in the log were not used by the server?! so I don't understand why it insists on sending KoD to its client).
I've found two things so far:
ntpq
I can run ntpq
like so:
ntpq -pn
When the NTP server works, I can see an asterisk in front of the IP address the computer is happy about. In my case, all of the status flags (first column +
, -
, *
, #
, etc.) all disappear. As far as I know, that means the NTP service is not happy and no synchronization is being performed.
Here is an example when it still works (i.e. there are flags in the very first column):
remote refid st t when poll reach delay offset jitter
==============================================================================
10.0.2.255 .BCST. 16 B - 64 0 0.000 0.000 0.000
#51.77.203.211 134.59.1.5 3 u 4 64 1 171.248 -743.64 691.917
+72.5.72.15 216.218.254.202 2 u 2 64 1 19.223 -778.34 686.200
+159.69.25.180 192.53.103.103 2 u 3 64 1 237.733 -775.41 701.376
+173.0.48.220 43.77.130.254 2 u 2 64 1 35.489 -778.85 669.187
38.229.56.9 172.16.21.35 2 u 31 64 1 153.976 -268.90 122.557
+137.190.2.4 .PPS. 1 u 31 64 1 93.797 -253.69 116.289
+150.136.0.232 185.125.206.71 3 u 35 64 1 95.667 -178.19 114.912
94.154.96.7 194.29.130.252 2 u 31 64 1 237.560 -231.88 107.230
+162.159.200.123 10.4.1.175 3 u 34 64 1 16.246 -199.68 115.561
*216.218.254.202 .CDMA. 1 u 35 64 1 52.906 -193.84 131.148
91.189.91.157 132.163.96.1 2 u 45 64 1 87.772 -5.716 0.000
+204.2.134.163 44.24.199.34 3 u 34 64 1 16.711 -199.12 116.777
+74.6.168.73 208.71.46.33 2 u 35 64 1 69.772 -189.21 128.119
91.189.89.199 17.253.34.123 2 u 45 64 1 165.471 -3.708 0.000
+216.229.0.49 216.218.192.202 2 u 35 64 1 71.437 -178.94 97.505
91.189.89.198 17.253.34.123 2 u 44 64 1 172.852 -17.899 0.000
ntpdate -q <ip>
The ntpdate
command will actually tell me whether the NTP is accepting packets. This is because it gives an error message if not:
$ sudo ntpdate -q 10.0.2.1
server 10.0.2.1, stratum 4, offset 5.194725, delay 0.02652
21 Jul 15:22:48 ntpdate[13086]: no server suitable for synchronization found
This happens after a little while when my main server loses the *
status on the one server it was first happy to synchronize with...
Now... I still need to understand what I have to do to fix this issue...
This may be helpful, here are the logs about a restart from a full reboot:
Jul 21 18:29:13 vm-ve-ctl kernel: [ 434.275481] audit: type=1400 audit(1626917353.636:43): apparmor="DENIED" operation="open" profile="/usr/sbin/ntp
d" name="/snap/bin/" pid=3896 comm="ntpd" requested_mask="r" denied_mask="r" fsuid=0 ouid=0
Jul 21 18:29:13 vm-ve-ctl ntpd[3896]: ntpd [email protected] (1): Starting
Jul 21 18:29:13 vm-ve-ctl ntpd[3896]: Command line: /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 126:129
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: proto: precision = 0.190 usec (-22)
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Cannot open logfile /var/log/ntp.log: Permission denied
Jul 21 18:29:13 vm-ve-ctl kernel: [ 434.291490] audit: type=1400 audit(1626917353.652:44): apparmor="DENIED" operation="capable" profile="/usr/sbin/
ntpd" pid=3901 comm="ntpd" capability=1 capname="dac_override"
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: leapsecond file ('/usr/share/zoneinfo/leap-seconds.list'): good hash signature
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: leapsecond file ('/usr/share/zoneinfo/leap-seconds.list'): loaded, expire=2021-12-28T00:00:00Z last=2017-01-01T
00:00:00Z ofs=37
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen and drop on 0 v6wildcard [::]:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen and drop on 1 v4wildcard 0.0.0.0:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen normally on 2 lo 127.0.0.1:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen normally on 3 enp0s3 192.168.2.120:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen normally on 4 enp0s8 10.0.2.1:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen normally on 5 lo [::1]:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen normally on 6 enp0s3 [fe80::a00:27ff:fe25:38ff%2]:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listen normally on 7 enp0s8 [fe80::a00:27ff:fe35:c30b%3]:123
Jul 21 18:29:13 vm-ve-ctl ntpd[3901]: Listening on routing socket on fd #24 for interface updates
Jul 21 18:29:14 vm-ve-ctl ntpd[3901]: Soliciting pool server 51.77.203.211
Jul 21 18:29:15 vm-ve-ctl ntpd[3901]: Soliciting pool server 159.69.25.180
Jul 21 18:29:15 vm-ve-ctl ntpd[3901]: Soliciting pool server 72.5.72.15
Jul 21 18:29:16 vm-ve-ctl ntpd[3901]: Soliciting pool server 198.251.86.68
Jul 21 18:29:16 vm-ve-ctl ntpd[3901]: Soliciting pool server 173.0.48.220
Jul 21 18:29:16 vm-ve-ctl ntpd[3901]: Soliciting pool server 38.229.56.9
Jul 21 18:29:17 vm-ve-ctl ntpd[3901]: Soliciting pool server 150.136.0.232
Jul 21 18:29:17 vm-ve-ctl ntpd[3901]: Soliciting pool server 94.154.96.7
Jul 21 18:29:17 vm-ve-ctl ntpd[3901]: Soliciting pool server 137.190.2.4
Jul 21 18:29:18 vm-ve-ctl ntpd[3901]: Soliciting pool server 162.159.200.123
Jul 21 18:29:18 vm-ve-ctl ntpd[3901]: Soliciting pool server 216.218.254.202
Jul 21 18:29:18 vm-ve-ctl ntpd[3901]: Soliciting pool server 91.189.91.157
Jul 21 18:29:19 vm-ve-ctl ntpd[3901]: Soliciting pool server 91.189.89.199
Jul 21 18:29:19 vm-ve-ctl ntpd[3901]: Soliciting pool server 74.6.168.73
Jul 21 18:29:19 vm-ve-ctl ntpd[3901]: Soliciting pool server 204.2.134.163
Jul 21 18:29:20 vm-ve-ctl ntpd[3901]: Soliciting pool server 91.189.89.198
Jul 21 18:29:20 vm-ve-ctl ntpd[3901]: Soliciting pool server 216.229.0.49
Jul 21 18:29:20 vm-ve-ctl ntpd[3901]: Soliciting pool server 2604:ed40:1000:1711:d862:f5ff:fe4e:41c4
Jul 21 18:29:21 vm-ve-ctl ntpd[3901]: receive: Unexpected origin timestamp 0xe4a34871.ac57f05d does not match aorg 0000000000.00000000 from [email protected] xmt 0xe4a34871.65648c54
I do not know exactly when it starts going bad. I've also seen the following which I thought could have something to do with it (i.e. when that happens, the corresponding IP is removed from the list!), but it is already bad now and no such error occurred in my last run.
Jul 21 18:08:57 vm-ve-ctl ntpd[9764]: 92.243.6.5 local addr 192.168.2.120 -> <null>
Note: the 192.168.2.120 is the IP of the failing computer. It is a VirtualBox. It has been working for months... though, maybe something changed which makes it unhappy.
I found this post about an issue with the ... -> <null>
message. However, I think we have a newer version on Ubuntu 18.04:
SUSE minimum recommended version: ntp-4.2.8p7-11.1
Ubuntu 18.04 version: 1:4.2.8p10+dfsg-5ubuntu7.3
Just in case, I tried to connect the VM to the host and I still get a huge offset and jitter. What could have changed?!
remote refid st t when poll reach delay offset jitter
==============================================================================
10.0.2.10 .POOL. 16 p - 64 0 0.000 0.000 0.000
10.0.2.10 132.163.97.6 2 u 54 64 3 0.457 -5254.2 3917.68
As asked by Paul Gear, here is the output of ntpq with additional details:
$ ntpq -ncrv
associd=0 status=0028 leap_none, sync_unspec, 2 events, no_sys_peer,
version="ntpd [email protected] (1)", processor="x86_64",
system="Linux/4.15.0-151-generic", leap=00, stratum=4, precision=-23,
rootdelay=17.930, rootdisp=5019.260, refid=173.255.215.209,
reftime=e4a44f7a.1c2ec778 Thu, Jul 22 2021 13:11:38.110,
clock=e4a45030.c8a4b259 Thu, Jul 22 2021 13:14:40.783, peer=0, tc=6,
mintc=3, offset=-109.527915, frequency=-1.707, sys_jitter=0.000000,
clk_jitter=38.724, clk_wander=0.000, tai=37, leapsec=201701010000,
expire=202112280000
Here is the list of available clocks and the one currently used:
$ grep . /sys/devices/system/clocksource/clocksource*/[ac]*clocksource
/sys/devices/system/clocksource/clocksource0/available_clocksource:kvm-clock tsc acpi_pm
/sys/devices/system/clocksource/clocksource0/current_clocksource:kvm-clock
And finally, the dmesg
output about the clocksource selection process:
$ dmesg | grep clocksource
[ 0.000000] clocksource: kvm-clock: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[ 0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[ 0.283117] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[ 1.161844] clocksource: Switched to clocksource kvm-clock
[ 1.208316] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[ 2.329228] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x1db81a3240f, max_idle_ns: 440795250379 ns