Google GCE VM - how to kill VMs if startup script fails

Danielle M.

9/5/23, 4:00 PM

We launch workloads in GCE using Managed Instance Groups (MIG), which oversee the lifecycle and health of these VMs.

New VMs are provisioned with a startup script (bash), which, on rare occasions, fails in some way. However, the VM is still able to start, launch it's workload, and pass it's health checks.

Is there some setting in GCE / MIGs that says "if the init script does not execute successfully, kill the VM, and recreate it" ?

I could shut down if an error is trapped, eg.:

...
exception() {
  echo 'startup script error; shutting down!'
  shutdown -h now
}

trap 'exception' ERR
...

But was hoping there was a more managed option.

0 + 0

google-compute-engine

Score:0

Server

Leo

9/6/23, 3:21 PM

I would like to know the error message from your MIG log because it could be an issue with the initial delay, so I suggest reviewing how the health check and autohealing policy are configured in your MIG. In this, there are some probes and settings that you can adjust like your vm --initial-delay. This setting delays autohealing from potentially prematurely recreating the VM if the VM is in the process of starting up, and could help with your startup script issue. Sometimes when the vm is starting, it needs more time to execute the startup script. It also helps if there is some delay in the network because some startup scripts issues are related to network connectivity with the metadata server. So, to avoid this you can increase the initial delay in your health check. You can obtain you health check with the following command:

gcloud compute health-checks describe <health check name>

You can update your health check using the update command like is shown in the following example:

gcloud compute instance-groups managed update my-mig \
        --health-check example-check \
        --initial-delay 300 \
        --zone us-east1-b

In this, you can see that the initial delay was set to 5 min, in the following link you will find more information about how to set up health checking and autohealing in a MIG.

Also you can check your instance at any time with this command:

gcloud compute instance-groups managed list-instances your-instance-group

NAME              ZONE                  STATUS   HEALTH_STATE  ACTION  INSTANCE_TEMPLATE                            VERSION_NAME  LAST_ERROR
igm-with-hc-fvz6  europe-west1          RUNNING  HEALTHY       NONE    my-template
igm-with-hc-gtz3  europe-west1          RUNNING  HEALTHY       NONE    my-template

0 + 0

Danielle M.

9/9/23, 2:33 PM

Hi @Leo! The issue isn't really with the health checks, it's with the startup script. I need to communicate with the MIG that this VM failed to provision, and needs to be recreated.

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Google GCE VM - how to kill VMs if startup script fails

TH: Google GCE VM - วิธีฆ่า VM หากสคริปต์เริ่มต้นทำงานล้มเหลว

RO: Google GCE VM - cum să omorâți mașinile virtuale dacă scriptul de pornire eșuează

RU: Google GCE VM - как убить виртуальные машины, если сценарий запуска не работает

VI: Google GCE VM - cách tắt máy ảo nếu tập lệnh khởi động không thành công

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.