I have a problem, one of the replicas is stuck in a Pending state.
Problem: After another deployment one of the new replicas stacked and I have an empty node which satisfy all necessary requronmetns.
Deployment contains nodeSelector and affinity requirements:
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- vision-api-extract
topologyKey: "kubernetes.io/hostname"
nodeSelector:
insttype: gpu
and there is 3 nodes with proper label
ip-10-0-11-16.ec2.internal Ready <none> 114d v1.18.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=g3.4xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-1,failure-domain.beta.kubernetes.io/zone=us-east-1b,insttype=gpu,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-11-16,kubernetes.io/os=linux,node.kubernetes.io/instance-type=g3.4xlarge,topology.ebs.csi.aws.com/zone=us-east-1b,topology.kubernetes.io/region=us-east-1,topology.kubernetes.io/zone=us-east-1b
ip-10-0-11-206.ec2.internal Ready <none> 342d v1.18.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=g3.4xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-1,failure-domain.beta.kubernetes.io/zone=us-east-1b,insttype=gpu,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-11-206,kubernetes.io/os=linux,node.kubernetes.io/instance-type=g3.4xlarge,topology.ebs.csi.aws.com/zone=us-east-1b,topology.kubernetes.io/region=us-east-1,topology.kubernetes.io/zone=us-east-1b
ip-10-0-11-44.ec2.internal Ready <none> 114d v1.18.3 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=g3.4xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-1,failure-domain.beta.kubernetes.io/zone=us-east-1b,insttype=gpu,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-11-44,kubernetes.io/os=linux,node.kubernetes.io/instance-type=g3.4xlarge,topology.ebs.csi.aws.com/zone=us-east-1b,topology.kubernetes.io/region=us-east-1,topology.kubernetes.io/zone=us-east-1b
And here is a description of the pending pod
Warning FailedScheduling <unknown> default-scheduler 0/13 nodes are available: 1 node(s) didn't match pod affinity/anti-affinity, 1 node(s) didn't satisfy existing pods anti-affinity rules, 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate, 10 node(s) didn't match node selector.
And empty node description as well
Name: ip-10-0-11-44.ec2.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=g3.4xlarge
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=us-east-1
failure-domain.beta.kubernetes.io/zone=us-east-1b
insttype=gpu
kubernetes.io/arch=amd64
kubernetes.io/hostname=ip-10-0-11-44
kubernetes.io/os=linux
node.kubernetes.io/instance-type=g3.4xlarge
topology.ebs.csi.aws.com/zone=us-east-1b
topology.kubernetes.io/region=us-east-1
topology.kubernetes.io/zone=us-east-1b
Annotations: csi.volume.kubernetes.io/nodeid: {"ebs.csi.aws.com":"i-00919faca1e45926f","efs.csi.aws.com":"i-00919faca1e45926f"}
flannel.alpha.coreos.com/backend-data: {"VtepMAC":"ce:02:a2:a2:5e:a7"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 10.0.11.44
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 26 Mar 2021 08:54:41 +0000
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: ip-10-0-11-44.ec2.internal
AcquireTime: <unset>
RenewTime: Sun, 18 Jul 2021 11:52:59 +0000
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Sun, 18 Jul 2021 11:51:26 +0000 Sat, 17 Jul 2021 14:00:36 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Sun, 18 Jul 2021 11:51:26 +0000 Sat, 17 Jul 2021 14:00:36 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Sun, 18 Jul 2021 11:51:26 +0000 Sat, 17 Jul 2021 14:00:36 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Sun, 18 Jul 2021 11:51:26 +0000 Sat, 17 Jul 2021 14:00:38 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled
Addresses:
InternalIP: 10.0.11.44
Hostname: ip-10-0-11-44.ec2.internal
InternalDNS: ip-10-0-11-44.ec2.internal
Capacity:
attachable-volumes-aws-ebs: 39
cpu: 16
ephemeral-storage: 60923672Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 125709124Ki
pods: 110
Allocatable:
attachable-volumes-aws-ebs: 39
cpu: 16
ephemeral-storage: 56147256023
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 125606724Ki
pods: 110
System Info:
Machine ID: 94c328b1fcaf4999b5de9f749ac998b8
System UUID: ec2c3806-d842-c53f-e93f-cf9059701bdd
Boot ID: 469aa16e-80f3-470b-9451-06078a78fa96
Kernel Version: 5.4.0-1051-aws
OS Image: Ubuntu 18.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://18.9.7
Kubelet Version: v1.18.3
Kube-Proxy Version: v1.18.3
PodCIDR: 10.244.8.0/24
PodCIDRs: 10.244.8.0/24
ProviderID: aws:///us-east-1b/i-00919faca1e45926f
Non-terminated Pods: (8 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system ebs-csi-controller-5b64f64f64-x97ng 0 (0%) 0 (0%) 0 (0%) 0 (0%) 24d
kube-system ebs-csi-node-2rwm4 0 (0%) 0 (0%) 0 (0%) 0 (0%) 114d
kube-system efs-csi-node-9dhb2 0 (0%) 0 (0%) 0 (0%) 0 (0%) 114d
kube-system kube-flannel-ds-amd64-9xkjg 100m (0%) 100m (0%) 50Mi (0%) 50Mi (0%) 114d
kube-system kube-proxy-nrjmh 0 (0%) 0 (0%) 0 (0%) 0 (0%) 114d
kube-system traefik-9mpzr 500m (3%) 1 (6%) 500Mi (0%) 800Mi (0%) 24d
monitoring node-exporter-gj2qw 112m (0%) 270m (1%) 200Mi (0%) 220Mi (0%) 114d
monitoring prometheus-operator-6f98f66b89-dnjqd 100m (0%) 200m (1%) 100Mi (0%) 200Mi (0%) 24d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 812m (5%) 1570m (9%)
memory 850Mi (0%) 1270Mi (1%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
attachable-volumes-aws-ebs 0 0