Score:0

How to configure shared VPC for kOps?

gu flag

As described in this documentation, I want to create a Kubernetes cluster using kOps in an existing VPC. I have created a VPC, Internet Gateway, Route Table, Subnet and an EC2 instance which I want to use for invoking the kops create cluster command and other stuff. These resources are made using the following CloudFormation template:

AWSTemplateFormatVersion: "2010-09-09"
Description: "AWS CloudFormation Template for Kops Poc"

Resources:
  KopsPocVPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 172.0.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true
      Tags:
        - Key: Name
          Value: tbe-kops-poc-vpc
        - Key: Project
          Value: Kops Poc

  KopsPocVPCCidrBlockIPv6:
    Type: AWS::EC2::VPCCidrBlock
    Properties:
      VpcId: !Ref KopsPocVPC
      AmazonProvidedIpv6CidrBlock: true

  KopsPocDHCPOptions:
    Type: AWS::EC2::DHCPOptions
    Properties:
      DomainName: ap-south-1.compute.internal
      DomainNameServers:
        - AmazonProvidedDNS
      Tags:
        - Key: Name
          Value: tbe-kops-poc-dopt
        - Key: Project
          Value: Kops Poc

  KopsPocVPCDHCPOptions:
    Type: AWS::EC2::VPCDHCPOptionsAssociation
    Properties:
      VpcId: !Ref KopsPocVPC
      DhcpOptionsId: !Ref KopsPocDHCPOptions

  KopsPocNetworkAcl:
    Type: AWS::EC2::NetworkAcl
    Properties:
      VpcId: !Ref KopsPocVPC
      Tags:
        - Key: Name
          Value: tbe-kops-poc-acl
        - Key: Project
          Value: Kops Poc

  KopsPocInboundNetworkAclEntryIPv4:
    Type: AWS::EC2::NetworkAclEntry
    Properties:
      NetworkAclId: !Ref KopsPocNetworkAcl
      RuleNumber: 100
      Protocol: -1
      RuleAction: allow
      Egress: false
      CidrBlock: 0.0.0.0/0

  KopsPocInboundNetworkAclEntryIPv6:
    Type: AWS::EC2::NetworkAclEntry
    Properties:
      NetworkAclId: !Ref KopsPocNetworkAcl
      RuleNumber: 101
      Protocol: -1
      RuleAction: allow
      Egress: false
      Ipv6CidrBlock: ::/0

  KopsPocOutboundNetworkAclEntryIPv4:
    Type: AWS::EC2::NetworkAclEntry
    Properties:
      NetworkAclId: !Ref KopsPocNetworkAcl
      RuleNumber: 100
      Protocol: -1
      RuleAction: allow
      Egress: true
      CidrBlock: 0.0.0.0/0

  KopsPocOutboundNetworkAclEntryIPv6:
    Type: AWS::EC2::NetworkAclEntry
    Properties:
      NetworkAclId: !Ref KopsPocNetworkAcl
      RuleNumber: 101
      Protocol: -1
      RuleAction: allow
      Egress: true
      Ipv6CidrBlock: ::/0

  KopsPocInternetGateway:
    Type: AWS::EC2::InternetGateway
    Properties:
      Tags:
        - Key: Name
          Value: tbe-kops-poc-igw
        - Key: Project
          Value: Kops Poc

  KopsPocVPCGatewayAttachment:
    Type: AWS::EC2::VPCGatewayAttachment
    Properties:
      VpcId: !Ref KopsPocVPC
      InternetGatewayId: !Ref KopsPocInternetGateway

  KopsPocRouteTable:
    Type: AWS::EC2::RouteTable
    Properties:
      VpcId: !Ref KopsPocVPC
      Tags:
        - Key: Name
          Value: tbe-kops-poc-rt
        - Key: Project
          Value: Kops Poc

  KopsPocRouteIPV4:
    Type: AWS::EC2::Route
    DependsOn: KopsPocVPCGatewayAttachment
    Properties:
      RouteTableId: !Ref KopsPocRouteTable
      DestinationCidrBlock: 0.0.0.0/0
      GatewayId: !Ref KopsPocInternetGateway

  KopsPocRouteIPV6:
    Type: AWS::EC2::Route
    DependsOn: KopsPocVPCGatewayAttachment
    Properties:
      RouteTableId: !Ref KopsPocRouteTable
      DestinationIpv6CidrBlock: ::/0
      GatewayId: !Ref KopsPocInternetGateway

  KopsPocSubnet:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref KopsPocVPC
      CidrBlock: 172.0.1.0/24
      AvailabilityZone: ap-south-1a
      Tags:
        - Key: Name
          Value: tbe-kops-poc-subnet
        - Key: Project
          Value: Kops Poc

  KopsPocSubnetRouteTableAssociation:
    Type: AWS::EC2::SubnetRouteTableAssociation
    Properties:
      SubnetId: !Ref KopsPocSubnet
      RouteTableId: !Ref KopsPocRouteTable

  KopsPocSubnetNetworkAclAssociation:
    Type: AWS::EC2::SubnetNetworkAclAssociation
    Properties:
      SubnetId: !Ref KopsPocSubnet
      NetworkAclId: !Ref KopsPocNetworkAcl

  KopsPocManagementInstanceSecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      VpcId: !Ref KopsPocVPC
      GroupDescription: Kops Poc Management Instance Security Group
      SecurityGroupIngress:
        - IpProtocol: tcp
          FromPort: 22
          ToPort: 22
          CidrIp: 0.0.0.0/0
      Tags:
        - Key: Name
          Value: tbe-kops-poc-management-sg
        - Key: Project
          Value: Kops Poc

  KopsPocManagementInstance:
    Type: AWS::EC2::Instance
    DependsOn: KopsPocVPCGatewayAttachment
    Properties:
      AvailabilityZone: ap-south-1a
      ImageId: ami-0cca134ec43cf708f
      InstanceType: t3a.large
      KeyName: tbe-kops-poc
      NetworkInterfaces:
        - NetworkInterfaceId: !Ref KopsPocEth0
          DeviceIndex: 0
      Volumes:
        - Device: /dev/sdf
          VolumeId: !Ref KopsPocManagementInstanceVolume
      IamInstanceProfile: TBEKopsPocEC2ServiceRole
      Tags:
        - Key: Name
          Value: tbe-kops-poc-management-instance
        - Key: Project
          Value: Kops Poc

  KopsPocIPAddress:
    Type: AWS::EC2::EIP
    DependsOn: KopsPocVPCGatewayAttachment
    Properties:
      Domain: vpc
      InstanceId: !Ref KopsPocManagementInstance
      Tags:
        - Key: Name
          Value: tbe-kops-poc-eip
        - Key: Project
          Value: Kops Poc

  KopsPocEth0:
    Type: AWS::EC2::NetworkInterface
    Properties:
      GroupSet:
        - !Ref KopsPocManagementInstanceSecurityGroup
      SubnetId: !Ref KopsPocSubnet
      Tags:
        - Key: Name
          Value: tbe-kops-poc-eth0
        - Key: Project
          Value: Kops Poc

  KopsPocManagementInstanceVolume:
    Type: AWS::EC2::Volume
    Properties:
      AvailabilityZone: ap-south-1a
      Size: 20
      VolumeType: gp3
      Tags:
        - Key: Name
          Value: tbe-kops-poc-volume
        - Key: Project
          Value: Kops Poc

After that, I am able to do ssh into this EC2 instance. In this instance, I have installed kops and kubectl. And also, have added the following in the environment variables:

export NAME="tbe-kops-poc.k8s.local"
export KOPS_STATE_STORE="s3://tbe-kops-poc-state-store"
export KOPS_OIDC_STORE="s3://tbe-kops-poc-oidc-store"
export MASTER_SIZE="t3a.large"
export MASTER_COUNT=1
export NODE_SIZE="t3a.large"
export NODE_COUNT=2
export ZONES="ap-south-1a"
export AMI_ID="ami-0cca134ec43cf708f"
export AWS_TAGS="Project=Kops Poc"

If I invoke the following command

kops create cluster --name=${NAME} --cloud=aws --cloud-labels="${AWS_TAGS}" --node-count=${NODE_COUNT} --zones=${ZONES} --node-size=${NODE_SIZE} --master-count=${MASTER_COUNT} --master-zones=${ZONES} --master-size=${MASTER_SIZE}

without providing the ID of the VPC created earlier, kOps is able to make the cluster, and kops validate cluster --name=${NAME} --wait 10m is able to validate the cluster.

But when I am providing the ID of the VPC created earlier using the option --vpc=<VPC_ID>, then kops validate cluster --name=${NAME} --wait 10m is having timeout. I have even tried --wait 30m, but the result is the same.

The error I am getting is as follows:

INSTANCE GROUPS
NAME                    ROLE    MACHINETYPE     MIN     MAX     SUBNETS
master-ap-south-1a      Master  t3a.large       1       1       ap-south-1a
nodes-ap-south-1a       Node    t3a.large       2       2       ap-south-1a

NODE STATUS
NAME    ROLE    READY

VALIDATION ERRORS
KIND    NAME                    MESSAGE
Machine i-0638d7877f8030ab3     machine "i-0638d7877f8030ab3" has not yet joined cluster
Machine i-071746f1afdb86c4f     machine "i-071746f1afdb86c4f" has not yet joined cluster
Machine i-07e0de0b4734bc99c     machine "i-07e0de0b4734bc99c" has not yet joined a cluster

I don't know to figure out why the issue is happening. Any suggestions or pointers much appreciated.

Thank you.

Update-1

I have executed the command with the --ssh-public-key option. Afterward I did ssh login into the master. From /var/log/syslog I can see some errors. The logs are:

Jan 23 12:54:54 i-0d65b4c07a7dcdcdb systemd[1]: Started Kubernetes Protokube Service.
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: protokube version 0.1
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.387031    5310 aws_volume.go:65] AWS API Request: ec2metadata/GetToken
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.388630    5310 aws_volume.go:65] AWS API Request: ec2metadata/GetDynamicData
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.394627    5310 aws_volume.go:65] AWS API Request: ec2metadata/GetMetadata
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.395412    5310 aws_volume.go:65] AWS API Request: ec2metadata/GetMetadata
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.399317    5310 aws_volume.go:65] AWS API Request: ec2/DescribeInstances
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.461619    5310 gossip.go:59] gossip dns connection limit is:0
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: I0123 12:54:54.462052    5310 aws_volume.go:65] AWS API Request: ec2/DescribeInstances
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: W0123 12:54:54.517477    5310 cluster.go:150] couldn't deduce an advertise address: no private IP found, explicit advertise addr not provided
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb protokube[5310]: E0123 12:54:54.519113    5310 main.go:197] error initializing secondary gossip: %!w(*errors.withStack=&{0xc000091280 0xc0007aa408})
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb systemd[1]: protokube.service: Main process exited, code=exited, status=1/FAILURE
Jan 23 12:54:54 i-0d65b4c07a7dcdcdb systemd[1]: protokube.service: Failed with result 'exit-code'.
Jan 23 12:54:57 i-0d65b4c07a7dcdcdb systemd[1]: protokube.service: Scheduled restart job, restart counter is at 61.
Jan 23 12:54:57 i-0d65b4c07a7dcdcdb systemd[1]: Stopped Kubernetes Protokube Service.

And this log is repeating.

There are some observations.

Observation-1:

Without providing the VPC ID, the Route table it creates has the following Routes: enter image description here

And with VPC ID, the Route table it creates has: enter image description here

Observation-2:

The master created without VPC ID has following output of ifconfig:

ens5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 172.20.57.190  netmask 255.255.224.0  broadcast 172.20.63.255
        inet6 fe80::1c:f5ff:feba:2d4e  prefixlen 64  scopeid 0x20<link>
        ether 02:1c:f5:ba:2d:4e  txqueuelen 1000  (Ethernet)
        RX packets 896498  bytes 1279076057 (1.2 GB)
        RX errors 0  dropped 2  overruns 0  frame 0
        TX packets 45327  bytes 5946634 (5.9 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 74883  bytes 23190458 (23.1 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 74883  bytes 23190458 (23.1 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

veth57fa98b3: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 100.96.0.1  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::c4fc:cff:fec5:777f  prefixlen 64  scopeid 0x20<link>
        ether c6:fc:0c:c5:77:7f  txqueuelen 0  (Ethernet)
        RX packets 158  bytes 15575 (15.5 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 173  bytes 16763 (16.7 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

vethd97efc68: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 100.96.0.1  netmask 255.255.255.255  broadcast 0.0.0.0
        inet6 fe80::b8d3:95ff:fecb:810b  prefixlen 64  scopeid 0x20<link>
        ether ba:d3:95:cb:81:0b  txqueuelen 0  (Ethernet)
        RX packets 1139  bytes 182549 (182.5 KB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 1298  bytes 817513 (817.5 KB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

And for the master created with the VPC ID has:

ens5: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 9001
        inet 172.0.2.197  netmask 255.255.255.0  broadcast 172.0.2.255
        inet6 fe80::cb:acff:febb:a14c  prefixlen 64  scopeid 0x20<link>
        ether 02:cb:ac:bb:a1:4c  txqueuelen 1000  (Ethernet)
        RX packets 793161  bytes 1076223553 (1.0 GB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 96108  bytes 17026472 (17.0 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 1000  (Local Loopback)
        RX packets 281449  bytes 52537138 (52.5 MB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 281449  bytes 52537138 (52.5 MB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Also, for the cluster created with VPC ID, the command kubectl get pods --all-namespaces returns:

E0123 13:28:46.878743    3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.879320    3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.880829    3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.882274    3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
E0123 13:28:46.883682    3024 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp 127.0.0.1:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?
in flag
Hi Tapas Bose welcome to S.F. There are almost infinite reasons why machines won't join clusters, but unless you already have some log egress setup, you'll have to ssh into them to look at the kubelet logs and see why those, specifically, are upset. Please don't put any updates in the comments, instead [edit your question](https://serverfault.com/posts/1120802/edit) to include further details. Good luck
Tapas Bose avatar
gu flag
Hi @mdaniel, indeed there can be many reasons why the machines won't join the clusters. I am looking for the standard practice of creating VPC and associated services, which can be used by kOps. I couldn't find such documentation.
in flag
Right, but I'm not saying to you "well, computers are magic, so there's no hope" I'm saying that you have to ssh or SSM into those machines and look at the logs and figure out why that is the case. I doubt anyone has the number of variables already in their head to offer you the answer without logs because there can be so many underlying causes
Tapas Bose avatar
gu flag
Hi @mdaniel, I have updated the question with logs I have gathered after SSH into the master.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.