Score:0

How to fix a node in docker swarm?

ke flag

I have a 4 node cluster in AWS, which 2 nodes are continuosly getting diconnected and sometimes rebooting works and sometimes need to reboot all the nodes in the cluster to get all back.

[ec2-user@ip-172-31-7-235 ~]$ docker node ls
ID                            HOSTNAME                                      STATUS    AVAILABILITY   MANAGER STATUS   ENGINE VERSION
xhei85m3mjp6wikz81phl01sx *   ip-172-31-7-235.us-west-2.compute.internal    Ready     Active         Leader           20.10.4
a63wole6vosq1t5s25wib8ggu     ip-172-31-36-138.us-west-2.compute.internal   Down      Active                          19.03.13-ce
guw26oul1i2fb60f5shud8xif     ip-172-31-47-112.us-west-2.compute.internal   Ready     Active         Reachable        19.03.13-ce
ex996ixxqo3s0mcig1zfzankg     ip-172-31-47-251.us-west-2.compute.internal   Ready     Active                          19.03.13-ce

And the output of inspect command:

[ec2-user@ip-172-31-7-235 ~]$ docker node inspect ip-172-31-36-138.us-west-2.compute.internal
[
    {
        "ID": "a63wole6vosq1t5s25wib8ggu",
        "Version": {
            "Index": 212444
        },
        "CreatedAt": "2021-02-10T13:25:54.271879167Z",
        "UpdatedAt": "2021-07-23T07:36:17.078000983Z",
        "Spec": {
            "Labels": {},
            "Role": "worker",
            "Availability": "active"
        },
        "Description": {
            "Hostname": "ip-172-31-36-138.us-west-2.compute.internal",
            "Platform": {
                "Architecture": "x86_64",
                "OS": "linux"
            },
            "Resources": {
                "NanoCPUs": 2000000000,
                "MemoryBytes": 8362287104
            },
            "Engine": {
                "EngineVersion": "19.03.13-ce",
                "Plugins": [
                    {
                        "Type": "Log",
                        "Name": "awslogs"
                    },
                    {
                        "Type": "Log",
                        "Name": "fluentd"
                    },
                    {
                        "Type": "Log",
                        "Name": "gcplogs"
                    },
                    {
                        "Type": "Log",
                        "Name": "gelf"
                    },
                    {
                        "Type": "Log",
                        "Name": "journald"
                    },
                    {
                        "Type": "Log",
                        "Name": "json-file"
                    },
                    {
                        "Type": "Log",
                        "Name": "local"
                    },
                    {
                        "Type": "Log",
                        "Name": "logentries"
                    },
                    {
                        "Type": "Log",
                        "Name": "splunk"
                    },
                    {
                        "Type": "Log",
                        "Name": "syslog"
                    },
                    {
                        "Type": "Network",
                        "Name": "bridge"
                    },
                    {
                        "Type": "Network",
                        "Name": "host"
                    },
                    {
                        "Type": "Network",
                        "Name": "ipvlan"
                    },
                    {
                        "Type": "Network",
                        "Name": "macvlan"
                    },
                    {
                        "Type": "Network",
                        "Name": "null"
                    },
                    {
                        "Type": "Network",
                        "Name": "overlay"
                    },
                    {
                        "Type": "Volume",
                        "Name": "local"
                    }
                ]
            },
            "TLSInfo": {
                "TrustRoot": "-----BEGIN CERTIFICATE-----\nMIIBajCCARCgAwIBAgIUCi5JL30BEEaYOmlbrp9A+Rivul0wCgYIKoZIzj0EAwIw\nEzERMA8GA1UEAxMIc3dhcm0tY2EwHhcNMjEwMjEwMTMwMjAwWhcNNDEwMjA1MTMw\nMjAwWjATMREwDwYDVQQDEwhzd2FybS1jYTBZMBMGByqGSM49AgEGCCqGSM49AwEH\nA0IABFqgXKora10w8BODSxg9O4N9UveYhsitjwz+pHSi/6BB0j7YBu+4RADv4ZjK\nitIYTCLZZKbOx9saQ2YeB8sBxFajQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNVHRMB\nAf8EBTADAQH/MB0GA1UdDgQWBBTETORYsVN1OwUTjtYJHSJtGx55QzAKBggqhkjO\nPQQDAgNIADBFAiEA7qNRnsq0LUFenYODEah4Rku1YYpHBCHIid4W4Hy7MVcCICQF\n9BTfuQsAp5uQ72ycyWQfyQziFzbG+Sb/zQ8NzCRf\n-----END CERTIFICATE-----\n",
                "CertIssuerSubject": "MBMxETAPBgNVBAMTCHN3YXJtLWNh",
                "CertIssuerPublicKey": "MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEWqBcqitrXTDwE4NLGD07g31S95iGyK2PDP6kdKL/oEHSPtgG77hEAO/hmMqK0hhMItlkps7H2xpDZh4HywHEVg=="
            }
        },
        "Status": {
            "State": "down",
            "Message": "heartbeat failure for node in \"unknown\" state",
            "Addr": "172.31.36.138"
        }
    }
]

Please suggest how to backtrace and fix this issue? The issue comes back even after replacing with a new node.

Baodi Di avatar
in flag
are you use spot instance or on demand? make sure your security group allow all of relative port/protocol
uday avatar
ke flag
on demand instance only. And I kept the ports open completeley
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.