Score:0

NetworkManager randomly crashes resulting in lost connection to Internet and local network

pt flag

I have the following problem: whenever my server is exposed to the internet, it becomes unreachable after a certain period of time. The time periods vary. Whenever that occurs I can ping the server from my local network, but I can not reach it via its dyndns adress in my browser, its ip adress and I can also not ssh into it. The only way I have found to fix it so far is restarting the machine.

The problem usually starts occuring after a day or so and it usually happens over night. I have looked at the logs and came to the following conclusion:

The Network Manager (NetworkManager.service) crashes and does not restart until reboot:

Jan 22 02:25:52 mussingerwebserver dockerd[2161]: time="2022-01-22T02:25:07.391683481+01:00" level=warning msg="Health check for container efbdf23db0420835ce07b90a9bd14870ebd08f281c00e4dc2a82258a5ae8f656 
Jan 22 02:26:32 mussingerwebserver kernel: [UFW BLOCK] IN=enp1s0 OUT= MAC=01:00:5e:00:00:fb:b0:35:b5:df:07:d2:08:00 SRC=192.168.178.66 DST=224.0.0.251 LEN=32 TOS=0x00 PREC=0x00 TTL=1 ID=17286 PROTO=2 
Jan 22 02:25:33 mussingerwebserver CRON[16536]: (www-data) CMD (php7.2 -f /var/www/nextcloud/cron.php)
Jan 22 02:25:47 mussingerwebserver NetworkManager[1229]: <info>  [1642814746.7425] manager: NetworkManager state is now CONNECTED_GLOBAL
Jan 22 02:26:45 mussingerwebserver whoopsie[1951]: [02:25:52] Could not get the list of active connections: Timeout was reached
Jan 22 02:26:45 mussingerwebserver whoopsie[1951]: [02:26:02] Cannot reach: https://daisy.ubuntu.com
Jan 22 02:27:06 mussingerwebserver dockerd[2161]: time="2022-01-22T02:25:17.538581342+01:00" level=warning msg="Health check for container f3b354722bbd1660aa6840ddce338cfd0dfa231e387b65137ee1039827a9a721 
Jan 22 02:27:06 mussingerwebserver dockerd[2161]: time="2022-01-22T02:25:17.538881010+01:00" level=warning msg="Health check for container 7c7475150e0df49c4a7af8417d15a3f3a263b47ebcff7674da998b63684fd9c7 
Jan 22 02:27:06 mussingerwebserver dockerd[2161]: time="2022-01-22T02:25:17.672603919+01:00" level=warning msg="Health check for container c050c5961997817472be6a1ecbb39898244b58c932b0314ecb436d11c0640dff 
Jan 22 02:27:06 mussingerwebserver dockerd[2161]: time="2022-01-22T02:25:17.538587478+01:00" level=warning msg="Health check for container fb75e0eeb20a372aeb64ea6604139232a5154ca2273ee214a4891ed17ef2e805 
 ESCOD

this clearly shows that Network manager fails around the same time (the server was running the entire time but the Network Manager did not write anything into the log):

Jan 22 00:46:08 mussingerwebserver NetworkManager[1229]: <info>  [1642808768.036
Jan 22 00:50:39 mussingerwebserver NetworkManager[1229]: <info>  [1642809039.276
Jan 22 01:56:08 mussingerwebserver NetworkManager[1229]: <info>  [1642812968.055
Jan 22 01:56:08 mussingerwebserver NetworkManager[1229]: <info>  [1642812968.055
Jan 22 02:00:39 mussingerwebserver NetworkManager[1229]: <info>  [1642813239.265
Jan 22 02:21:08 mussingerwebserver NetworkManager[1229]: <info>  [1642814468.045
Jan 22 02:21:08 mussingerwebserver NetworkManager[1229]: <info>  [1642814468.045
Jan 22 02:25:47 mussingerwebserver NetworkManager[1229]: <info>  [1642814746.742
-- Reboot --
Jan 22 09:37:42 mussingerwebserver systemd[1]: Starting Network Manager...

Recently I blocked it from connecting to the internet for a while (couple weeks) and it did not behave that way a single time. A few days ago I exposed it again and the problem started reocurring.

The system: old desktop PC running Ubuntu 18.04.6 LTS has Nextcloud, Unifi Network manager, Influxdb and netdata installed Nextcloud is exposed to the internet via duckdns (on a Fritzbox) with a letsencrypt SSL certificate the machine is connected via ethernet

I have tried: to locate the apport log in /var/crash. This directory has no crash report from the incident, I was however able to make apport generate a crash report by messing around and issuing an unsupported command. I first thought it might not have the needed permissions to write to /var/crash but that does not seem to be the case.

I have also checked if Restart on failure is enabled and it looks like it is, /lib/systemd/system/NetworkManager.service states:

Restart=on-failure

At this point I do not know what to look for anymore and hope someone can help with the following questions:

  • Why does NetworkManager crash?
  • Is that the root of my problem or is it merely a symptom?
  • Why does apport fail to generate an error log? An error is clearly occuring, why else would whoopsie be triggered.
  • Why does NetworkManager not restart even though its configuration specifies that it should?

I hope somebody can help me out

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.