I have the following problem:
whenever my server is exposed to the internet, it becomes unreachable after a certain period of time. The time periods vary. Whenever that occurs I can ping the server from my local network, but I can not reach it via its dyndns adress in my browser, its ip adress and I can also not ssh into it. The only way I have found to fix it so far is restarting the machine.
The problem usually starts occuring after a day or so and it usually happens over night. I have looked at the logs and came to the following conclusion:
The Network Manager (NetworkManager.service) crashes and does not restart until reboot:
Jan 22 02:25:52 mussingerwebserver dockerd[2161]: time="2022-01-22T02:25:07.391683481+01:00" level=warning msg="Health check for container efbdf23db0420835ce07b90a9bd14870ebd08f281c00e4dc2a82258a5ae8f656
Jan 22 02:26:32 mussingerwebserver kernel: [UFW BLOCK] IN=enp1s0 OUT= MAC=01:00:5e:00:00:fb:b0:35:b5:df:07:d2:08:00 SRC=192.168.178.66 DST=224.0.0.251 LEN=32 TOS=0x00 PREC=0x00 TTL=1 ID=17286 PROTO=2
Jan 22 02:25:33 mussingerwebserver CRON[16536]: (www-data) CMD (php7.2 -f /var/www/nextcloud/cron.php)
Jan 22 02:25:47 mussingerwebserver NetworkManager[1229]: <info> [1642814746.7425] manager: NetworkManager state is now CONNECTED_GLOBAL
Jan 22 02:26:45 mussingerwebserver whoopsie[1951]: [02:25:52] Could not get the list of active connections: Timeout was reached
Jan 22 02:26:45 mussingerwebserver whoopsie[1951]: [02:26:02] Cannot reach: https://daisy.ubuntu.com
Jan 22 02:27:06 mussingerwebserver dockerd[2161]: time="2022-01-22T02:25:17.538581342+01:00" level=warning msg="Health check for container f3b354722bbd1660aa6840ddce338cfd0dfa231e387b65137ee1039827a9a721
Jan 22 02:27:06 mussingerwebserver dockerd[2161]: time="2022-01-22T02:25:17.538881010+01:00" level=warning msg="Health check for container 7c7475150e0df49c4a7af8417d15a3f3a263b47ebcff7674da998b63684fd9c7
Jan 22 02:27:06 mussingerwebserver dockerd[2161]: time="2022-01-22T02:25:17.672603919+01:00" level=warning msg="Health check for container c050c5961997817472be6a1ecbb39898244b58c932b0314ecb436d11c0640dff
Jan 22 02:27:06 mussingerwebserver dockerd[2161]: time="2022-01-22T02:25:17.538587478+01:00" level=warning msg="Health check for container fb75e0eeb20a372aeb64ea6604139232a5154ca2273ee214a4891ed17ef2e805
ESCOD
this clearly shows that Network manager fails around the same time (the server was running the entire time but the Network Manager did not write anything into the log):
Jan 22 00:46:08 mussingerwebserver NetworkManager[1229]: <info> [1642808768.036
Jan 22 00:50:39 mussingerwebserver NetworkManager[1229]: <info> [1642809039.276
Jan 22 01:56:08 mussingerwebserver NetworkManager[1229]: <info> [1642812968.055
Jan 22 01:56:08 mussingerwebserver NetworkManager[1229]: <info> [1642812968.055
Jan 22 02:00:39 mussingerwebserver NetworkManager[1229]: <info> [1642813239.265
Jan 22 02:21:08 mussingerwebserver NetworkManager[1229]: <info> [1642814468.045
Jan 22 02:21:08 mussingerwebserver NetworkManager[1229]: <info> [1642814468.045
Jan 22 02:25:47 mussingerwebserver NetworkManager[1229]: <info> [1642814746.742
-- Reboot --
Jan 22 09:37:42 mussingerwebserver systemd[1]: Starting Network Manager...
Recently I blocked it from connecting to the internet for a while (couple weeks) and it did not behave that way a single time. A few days ago I exposed it again and the problem started reocurring.
The system:
old desktop PC running Ubuntu 18.04.6 LTS
has Nextcloud, Unifi Network manager, Influxdb and netdata installed
Nextcloud is exposed to the internet via duckdns (on a Fritzbox) with a letsencrypt SSL certificate
the machine is connected via ethernet
I have tried:
to locate the apport log in /var/crash. This directory has no crash report from the incident, I was however able to make apport generate a crash report by messing around and issuing an unsupported command. I first thought it might not have the needed permissions to write to /var/crash but that does not seem to be the case.
I have also checked if Restart on failure is enabled and it looks like it is,
/lib/systemd/system/NetworkManager.service states:
Restart=on-failure
At this point I do not know what to look for anymore and hope someone can help with the following questions:
- Why does NetworkManager crash?
- Is that the root of my problem or is it merely a symptom?
- Why does apport fail to generate an error log? An error is clearly occuring, why else would whoopsie be triggered.
- Why does NetworkManager not restart even though its configuration specifies that it should?
I hope somebody can help me out