As described in this documentation, I want to create a Kubernetes cluster using kOps in an existing VPC. I have created a VPC, Internet Gateway, Route Table, Subnet and an EC2 instance which I want to use for invoking the kops create cluster
command and other stuff. These resources are made using the following CloudFormation template:
AWSTemplateFormatVersion: "2010-09-09"
Description: "AWS CloudFormation Template for Kops Poc"
Resources:
KopsPocVPC:
Type: AWS::EC2::VPC
Properties:
CidrBlock: 172.0.0.0/16
EnableDnsHostnames: true
EnableDnsSupport: true
Tags:
- Key: Name
Value: tbe-kops-poc-vpc
- Key: Project
Value: Kops Poc
KopsPocVPCCidrBlockIPv6:
Type: AWS::EC2::VPCCidrBlock
Properties:
VpcId: !Ref KopsPocVPC
AmazonProvidedIpv6CidrBlock: true
KopsPocDHCPOptions:
Type: AWS::EC2::DHCPOptions
Properties:
DomainName: ap-south-1.compute.internal
DomainNameServers:
- AmazonProvidedDNS
Tags:
- Key: Name
Value: tbe-kops-poc-dopt
- Key: Project
Value: Kops Poc
KopsPocVPCDHCPOptions:
Type: AWS::EC2::VPCDHCPOptionsAssociation
Properties:
VpcId: !Ref KopsPocVPC
DhcpOptionsId: !Ref KopsPocDHCPOptions
KopsPocNetworkAcl:
Type: AWS::EC2::NetworkAcl
Properties:
VpcId: !Ref KopsPocVPC
Tags:
- Key: Name
Value: tbe-kops-poc-acl
- Key: Project
Value: Kops Poc
KopsPocInboundNetworkAclEntryIPv4:
Type: AWS::EC2::NetworkAclEntry
Properties:
NetworkAclId: !Ref KopsPocNetworkAcl
RuleNumber: 100
Protocol: -1
RuleAction: allow
Egress: false
CidrBlock: 0.0.0.0/0
KopsPocInboundNetworkAclEntryIPv6:
Type: AWS::EC2::NetworkAclEntry
Properties:
NetworkAclId: !Ref KopsPocNetworkAcl
RuleNumber: 101
Protocol: -1
RuleAction: allow
Egress: false
Ipv6CidrBlock: ::/0
KopsPocOutboundNetworkAclEntryIPv4:
Type: AWS::EC2::NetworkAclEntry
Properties:
NetworkAclId: !Ref KopsPocNetworkAcl
RuleNumber: 100
Protocol: -1
RuleAction: allow
Egress: true
CidrBlock: 0.0.0.0/0
KopsPocOutboundNetworkAclEntryIPv6:
Type: AWS::EC2::NetworkAclEntry
Properties:
NetworkAclId: !Ref KopsPocNetworkAcl
RuleNumber: 101
Protocol: -1
RuleAction: allow
Egress: true
Ipv6CidrBlock: ::/0
KopsPocInternetGateway:
Type: AWS::EC2::InternetGateway
Properties:
Tags:
- Key: Name
Value: tbe-kops-poc-igw
- Key: Project
Value: Kops Poc
KopsPocVPCGatewayAttachment:
Type: AWS::EC2::VPCGatewayAttachment
Properties:
VpcId: !Ref KopsPocVPC
InternetGatewayId: !Ref KopsPocInternetGateway
KopsPocRouteTable:
Type: AWS::EC2::RouteTable
Properties:
VpcId: !Ref KopsPocVPC
Tags:
- Key: Name
Value: tbe-kops-poc-rt
- Key: Project
Value: Kops Poc
KopsPocRouteIPV4:
Type: AWS::EC2::Route
DependsOn: KopsPocVPCGatewayAttachment
Properties:
RouteTableId: !Ref KopsPocRouteTable
DestinationCidrBlock: 0.0.0.0/0
GatewayId: !Ref KopsPocInternetGateway
KopsPocRouteIPV6:
Type: AWS::EC2::Route
DependsOn: KopsPocVPCGatewayAttachment
Properties:
RouteTableId: !Ref KopsPocRouteTable
DestinationIpv6CidrBlock: ::/0
GatewayId: !Ref KopsPocInternetGateway
KopsPocSubnet:
Type: AWS::EC2::Subnet
Properties:
VpcId: !Ref KopsPocVPC
CidrBlock: 172.0.1.0/24
AvailabilityZone: ap-south-1a
Tags:
- Key: Name
Value: tbe-kops-poc-subnet
- Key: Project
Value: Kops Poc
KopsPocSubnetRouteTableAssociation:
Type: AWS::EC2::SubnetRouteTableAssociation
Properties:
SubnetId: !Ref KopsPocSubnet
RouteTableId: !Ref KopsPocRouteTable
KopsPocSubnetNetworkAclAssociation:
Type: AWS::EC2::SubnetNetworkAclAssociation
Properties:
SubnetId: !Ref KopsPocSubnet
NetworkAclId: !Ref KopsPocNetworkAcl
KopsPocManagementInstanceSecurityGroup:
Type: AWS::EC2::SecurityGroup
Properties:
VpcId: !Ref KopsPocVPC
GroupDescription: Kops Poc Management Instance Security Group
SecurityGroupIngress:
- IpProtocol: tcp
FromPort: 22
ToPort: 22
CidrIp: 0.0.0.0/0
Tags:
- Key: Name
Value: tbe-kops-poc-management-sg
- Key: Project
Value: Kops Poc
KopsPocManagementInstance:
Type: AWS::EC2::Instance
DependsOn: KopsPocVPCGatewayAttachment
Properties:
AvailabilityZone: ap-south-1a
ImageId: ami-0cca134ec43cf708f
InstanceType: t3a.large
KeyName: tbe-kops-poc
NetworkInterfaces:
- NetworkInterfaceId: !Ref KopsPocEth0
DeviceIndex: 0
Volumes:
- Device: /dev/sdf
VolumeId: !Ref KopsPocManagementInstanceVolume
IamInstanceProfile: TBEKopsPocEC2ServiceRole
Tags:
- Key: Name
Value: tbe-kops-poc-management-instance
- Key: Project
Value: Kops Poc
KopsPocIPAddress:
Type: AWS::EC2::EIP
DependsOn: KopsPocVPCGatewayAttachment
Properties:
Domain: vpc
InstanceId: !Ref KopsPocManagementInstance
Tags:
- Key: Name
Value: tbe-kops-poc-eip
- Key: Project
Value: Kops Poc
KopsPocEth0:
Type: AWS::EC2::NetworkInterface
Properties:
GroupSet:
- !Ref KopsPocManagementInstanceSecurityGroup
SubnetId: !Ref KopsPocSubnet
Tags:
- Key: Name
Value: tbe-kops-poc-eth0
- Key: Project
Value: Kops Poc
KopsPocManagementInstanceVolume:
Type: AWS::EC2::Volume
Properties:
AvailabilityZone: ap-south-1a
Size: 20
VolumeType: gp3
Tags:
- Key: Name
Value: tbe-kops-poc-volume
- Key: Project
Value: Kops Poc
After that, I am able to do ssh into this EC2 instance. In this instance, I have installed kops and kubectl. And also, have added the following in the environment variables:
export NAME="tbe-kops-poc.k8s.local"
export KOPS_STATE_STORE="s3://tbe-kops-poc-state-store"
export KOPS_OIDC_STORE="s3://tbe-kops-poc-oidc-store"
export MASTER_SIZE="t3a.large"
export MASTER_COUNT=1
export NODE_SIZE="t3a.large"
export NODE_COUNT=2
export ZONES="ap-south-1a"
export AMI_ID="ami-0cca134ec43cf708f"
export AWS_TAGS="Project=Kops Poc"
If I invoke the following command
kops create cluster --name=${NAME} --cloud=aws --cloud-labels="${AWS_TAGS}" --node-count=${NODE_COUNT} --zones=${ZONES} --node-size=${NODE_SIZE} --master-count=${MASTER_COUNT} --master-zones=${ZONES} --master-size=${MASTER_SIZE}
without providing the ID of the VPC created earlier, kOps is able to make the cluster, and kops validate cluster --name=${NAME} --wait 10m
is able to validate the cluster.
But when I am providing the ID of the VPC created earlier using the option --vpc=<VPC_ID>
, then kops validate cluster --name=${NAME} --wait 10m
is having timeout. I have even tried --wait 30m
, but the result is the same.
The error I am getting is as follows:
INSTANCE GROUPS
NAME ROLE MACHINETYPE MIN MAX SUBNETS
master-ap-south-1a Master t3a.large 1 1 ap-south-1a
nodes-ap-south-1a Node t3a.large 2 2 ap-south-1a
NODE STATUS
NAME ROLE READY
VALIDATION ERRORS
KIND NAME MESSAGE
Machine i-0638d7877f8030ab3 machine "i-0638d7877f8030ab3" has not yet joined cluster
Machine i-071746f1afdb86c4f machine "i-071746f1afdb86c4f" has not yet joined cluster
Machine i-07e0de0b4734bc99c machine "i-07e0de0b4734bc99c" has not yet joined a cluster
I don't know to figure out why the issue is happening. Any suggestions or pointers much appreciated.
Thank you.
Update-1
I have executed the command with the --ssh-public-key
option. Afterward I did ssh login into the master. From /var/log/syslog I can see some errors. The logs are:
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb systemd[1]: Started Kubernetes Protokube Service.
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: protokube version 0.1
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.387031 5310 aws_volume.go:65] AWS API Request: ec2metadata/GetToken
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.388630 5310 aws_volume.go:65] AWS API Request: ec2metadata/GetDynamicData
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.394627 5310 aws_volume.go:65] AWS API Request: ec2metadata/GetMetadata
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.395412 5310 aws_volume.go:65] AWS API Request: ec2metadata/GetMetadata
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.399317 5310 aws_volume.go:65] AWS API Request: ec2/DescribeInstances
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.461619 5310 gossip.go:59] gossip dns connection limit is:0
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.462052 5310 aws_volume.go:65] AWS API Request: ec2/DescribeInstances
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: W0123 12:54:54.517477 5310 cluster.go:150] couldn't deduce an advertise address: no private IP found, explicit advertise addr not provided
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: E0123 12:54:54.519113 5310 main.go:197] error initializing secondary gossip: %!w(*errors.withStack=&{0xc000091280 0xc0007aa408})
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb systemd[1]: protokube.service: Main process exited, code=exited, status=1/FAILURE
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb systemd[1]: protokube.service: Failed with result 'exit-code'.
Jan 23 12:54:57 i-0d65b4c07a7dcdcdb systemd[1]: protokube.service: Scheduled restart job, restart counter is at 61.
Jan 23 12:54:57 i-0d65b4c07a7dcdcdb systemd[1]: Stopped Kubernetes Protokube Service.
And this log is repeating.
There are some observations.
Observation-1:
Without providing the VPC ID, the Route table it creates has the following Routes:
And with VPC ID, the Route table it creates has:
Observation-2:
The master created without VPC ID has following output of ifconfig
:
ens5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 172.20.57.190 netmask 255.255.224.0 broadcast 172.20.63.255
inet6 fe80::1c:f5ff:feba:2d4e prefixlen 64 scopeid 0x20<link>
ether 02:1c:f5:ba:2d:4e txqueuelen 1000 (Ethernet)
RX packets 896498 bytes 1279076057 (1.2 GB)
RX errors 0 dropped 2 overruns 0 frame 0
TX packets 45327 bytes 5946634 (5.9 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 74883 bytes 23190458 (23.1 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 74883 bytes 23190458 (23.1 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
veth57fa98b3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 100.96.0.1 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::c4fc:cff:fec5:777f prefixlen 64 scopeid 0x20<link>
ether c6:fc:0c:c5:77:7f txqueuelen 0 (Ethernet)
RX packets 158 bytes 15575 (15.5 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 173 bytes 16763 (16.7 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
vethd97efc68: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 100.96.0.1 netmask 255.255.255.255 broadcast 0.0.0.0
inet6 fe80::b8d3:95ff:fecb:810b prefixlen 64 scopeid 0x20<link>
ether ba:d3:95:cb:81:0b txqueuelen 0 (Ethernet)
RX packets 1139 bytes 182549 (182.5 KB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1298 bytes 817513 (817.5 KB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
And for the master created with the VPC ID has:
ens5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 9001
inet 172.0.2.197 netmask 255.255.255.0 broadcast 172.0.2.255
inet6 fe80::cb:acff:febb:a14c prefixlen 64 scopeid 0x20<link>
ether 02:cb:ac:bb:a1:4c txqueuelen 1000 (Ethernet)
RX packets 793161 bytes 1076223553 (1.0 GB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 96108 bytes 17026472 (17.0 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
inet 127.0.0.1 netmask 255.0.0.0
inet6 ::1 prefixlen 128 scopeid 0x10<host>
loop txqueuelen 1000 (Local Loopback)
RX packets 281449 bytes 52537138 (52.5 MB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 281449 bytes 52537138 (52.5 MB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
Also, for the cluster created with VPC ID, the command kubectl get pods --all-namespaces
returns:
E0123 13:28:46.878743 3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.879320 3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.880829 3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.882274 3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.883682 3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?