Score:0

GCP VM (using cloud NAT) loses internet connection

bo flag

We have a pretty simple setup for the VM on GCP w/o public IP address. To reach the internet, we use cloud NAT (w/ the basic configuration, see attached image):

enter image description here

The problem we have is that the VM loses the internet connection:

  1. we can not access it using SSH
  2. based on the syslog VM can not access GCE metadata server (OSConfigAgent[514]: 2023-03-10T15:49:41.8034Z OSConfigAgent Error main.go:231: network error when requesting metadata, make sure your instance has an active network and can reach the metadata server: Get http://169.254.169.254/computeMetadata/v1/?recursive=true&alt=json&wait_for_change=true&last_etag=2a783d496d54f634&timeout_sec=60: dial tcp 169.254.169.254:80: connect: network is unreachable)

The only solution to this case is to restart the VM & network starts to work. The 2nd log is continuously repeated after something happens. On the other hand we have preceding logs:

  1. systemd-networkd[501671]: ens4: Could not set DHCPv4 address: Connection timed out
  2. systemd-networkd[501671]: ens4: Failed
  3. kernel: [1118386.615077] systemd invoked oom-killer: gfp_mask=0x1100cca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0

Initially we suspected that the problem may be related to the Cloud NAT, but we do not have any evidence to prove and handle that, because in the NAT logs (errors & transactions) there are no significant errors.

The main idea of this question would be to avoid or handle the such situation automatically, w/o manual intervention. Please let me know if additional information is required.

Abhijith Chitrapu avatar
tr flag
Is VM running out of memory ? Please check whether it is showing any OOM-Killer message?
Score:2
cn flag

Your system is sized to small. Notice the message systemd invoked oom-killer.

That is causing the network to fail Could not set DHCPv4 address.

Solution: either improve the applications running on the instance to use less memory or select an instance size that can handle the workload.

Giorgi Jambazishvili avatar
bo flag
Thanks, just one thing, once we've upgraded the system, it seems that the application uses these free resources (mainly the chromium instances). I wonder, if there is some way of automatically know those instances (or possibly any other application) that there is not enough resources for them. Or even, how to instruct oom-killer that the important services should not be killed?
John Hanley avatar
cn flag
@GiorgiJambazishvili - Recommend creating a new question. Remember, one question per post.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.