Score:1

UPI/Bare Metal - The master node cannot finish the installation ("config/master", "timeout awaiting response headers"/"EOF")

in flag

Simple question, but so far very difficult to answer... =-[

I am trying to deploy OpenShift (OKD) 4.5 or 4.7 as directed here Guide: Installing an OKD 4.5 Cluster. Look at the "Starting the control plane nodes" section.

I'm trying to create the cluster using an UPI (User Provisioned Infrastructure)/Bare Metal (KVM).

PROBLEM:

  • Version 4.5

The master node cannot finish installation after reboot. It keeps showing the following error...

[ 1304.254380] ignition[485]: GET https://api-int.mbr.okd.local:22623/config/master: attempt #92
[ 1314.264629] ignition[485]: GET error: Get "https://api-int.mbr.okd.local:22623/config/master": net/http: timeout awaiting response headers

For version 4.5 we use "Fedora CoreOS 32.20200715.3.0".

  • Version 4.7

The master node cannot finish installation after reboot. It keeps showing the following error...

[  543.933709] ignition[505]: GET https://api-int.mbr.okd.local:22623/config/master: attempt #112
[  543.939340] ignition[505]: GET error: Get "https://api-int.mbr.okd.loca1:22623/config/master": EOF

For version 4.7 we use "Fedora CoreOS 34.20210518.3.0".


I've waited hours and hours and the master nodes are still in the same situation. What can I do to resolve this?

Thanks! =D


MORE INFORMATION:

See if this helps...

This output occurs in okd_master_3 (10.3.0.7)....

[ 1304.254380] ignition[485]: GET https://api-int.mbr.okd.local:22623/config/master: attempt #92
[ 1314.264629] ignition[485]: GET error: Get "https://api-int.mbr.okd.local:22623/config/master": net/http: timeout awaiting response headers

Connecting okd_master_2 (10.3.0.6) from okd_services (10.3.0.14)...

NOTE: The okd_master_2 (10.3.0.6) was able to boot (reached login screen).

[root@okd_services ~]# ssh [email protected]
The authenticity of host '10.3.0.6 (10.3.0.6)' can't be established.
ECDSA key fingerprint is SHA256:1xdq65g0ljnZYR6uXHaXW6EsxO3u6X268s4Z9Kfq0ng.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '10.3.0.6' (ECDSA) to the list of known hosts.
Fedora CoreOS 32.20200629.3.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/c/server/coreos/

Pinging the okd_bootstrap (10.3.0.4) from okd_master_2 (10.3.0.6)...

[core@localhost ~]$ ping 10.3.0.4
PING 10.3.0.4 (10.3.0.4) 56(84) bytes of data.
64 bytes from 10.3.0.4: icmp_seq=1 ttl=64 time=0.973 ms
64 bytes from 10.3.0.4: icmp_seq=2 ttl=64 time=0.801 ms
64 bytes from 10.3.0.4: icmp_seq=3 ttl=64 time=0.373 ms
64 bytes from 10.3.0.4: icmp_seq=4 ttl=64 time=0.647 ms
^C
--- 10.3.0.4 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3032ms
rtt min/avg/max/mdev = 0.373/0.698/0.973/0.220 ms

Calling the problematic URL from okd_master_2 (10.3.0.6)...

[core@localhost ~]$ curl -kv https://api-int.mbr.okd.local:22623/config/master
*   Trying 10.3.0.14:22623...
* Connected to api-int.mbr.okd.local (10.3.0.14) port 22623 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/pki/tls/certs/ca-bundle.crt
  CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=api-int.mbr.okd.local
*  start date: Jun 16 23:52:22 2021 GMT
*  expire date: Jun 14 23:52:23 2031 GMT
*  issuer: OU=openshift; CN=root-ca
*  SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x561ed249aa40)
> GET /config/master HTTP/2
> Host: api-int.mbr.okd.local:22623
> user-agent: curl/7.69.1
> accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 500 
< content-length: 0
< date: Thu, 17 Jun 2021 14:55:43 GMT
< 
* Connection #0 to host api-int.mbr.okd.local left intact

INFRASTRUCTURE:

Virtual machines...

NAME           ROLE                   OS              IP          MAC
okd_boostrap   bootstrap              Fedora CoreOS   10.3.0.4    52:54:00:07:80:62
okd_master_1   master                 Fedora CoreOS   10.3.0.5    52:54:00:7d:97:70
okd_master_2   master                 Fedora CoreOS   10.3.0.6    52:54:00:6e:52:85
okd_master_3   master                 Fedora CoreOS   10.3.0.7    52:54:00:a3:65:d9
okd_worker_1   worker                 Fedora CoreOS   10.3.0.8    52:54:00:e3:7c:fb
okd_worker_2   worker                 Fedora CoreOS   10.3.0.9    52:54:00:20:ec:4f
okd_services   DNS/LB/web/NFS         CentOS 8        10.3.0.14   52:54:00:3a:fd:a2
                                                         10.2.0.18   52:54:00:92:ce:78
okd_pfsense    firewall/router/DHCP   FreeBSD         10.3.0.2 52:54:00:d8:27:82
                                                         10.2.0.19   52:54:00:ac:82:7d

 . OKD_LAN: "10.3.0";
 . EXT_LAN: "10.2.0".

Some acronyms...
 _ DNS - Domain Name System;
 _ LB - Load Balancing;
 _ Web - Web Server;
 _ NFS - Network File Sharing.

Network layout...

           ...→.[N]WAN/EXT_LAN([R]dhcp).←... (10.2.0.0/24)
           ↓                               ↓
          [I]WAN/EXT_LAN                  [I]WAN/EXT_LAN
  [V]OKD_PFSENSE([R]dhcp)                 [V]OKD_SERVICES
          [I]OKD_LAN                      [I]OKD_LAN
           ↑                               ↑
           .........→.[N]OKD_LAN.←.......... (10.3.0.0/24)
                       ↑
      ...................................
      ↓                ↓                ↓
     [V]OKD_BOOSTRAP  [V]OKD_MASTER_1  [V]OKD_WORKER_1
                      [V]OKD_MASTER_2  [V]OKD_WORKER_2
                      [V]OKD_MASTER_3

 _ [N] - Network;
 _ [R] - Network Resource;
 _ [I] - Network Interface;
 _ [V] - Virtual Machine.

CONFIGURATION FILES:

BIND 9 (DNS):

. db.10.3.0

$TTL    604800
@   IN  SOA okd-services.okd.local. admin.okd.local. (
        6       ; Serial
        604800  ; Refresh
        86400   ; Retry
        2419200 ; Expire
        604800  ; Negative Cache TTL
)

; Name servers - "NS" records.
    IN  NS  okd-services.okd.local.

; Name servers - "PTR" records.
14 IN  PTR okd-services.okd.local.

; OpenShift container platform cluster - "PTR" records.
4 IN  PTR okd-boostrap.mbr.okd.local.
5 IN  PTR okd-master-1.mbr.okd.local.
6 IN  PTR okd-master-2.mbr.okd.local.
7 IN  PTR okd-master-3.mbr.okd.local.
8 IN  PTR okd-worker-1.mbr.okd.local.
9 IN  PTR okd-worker-2.mbr.okd.local.
14 IN  PTR api.mbr.okd.local.
14 IN  PTR api-int.mbr.okd.local.

. db.okd.local

$TTL    604800
@   IN  SOA okd-services.okd.local. admin.okd.local. (
        1       ; Serial
        604800  ; Refresh
        86400   ; Retry
        2419200 ; Expire
        604800  ; Negative Cache TTL
)

; Name servers - "NS" records.
    IN  NS  okd-services

; Name servers - "A" records.
okd-services.okd.local. IN A 10.3.0.14

; OpenShift container platform cluster - "A" records.
okd-boostrap.mbr.okd.local. IN  A   10.3.0.4
okd-master-1.mbr.okd.local. IN  A   10.3.0.5
okd-master-2.mbr.okd.local. IN  A   10.3.0.6
okd-master-3.mbr.okd.local. IN  A   10.3.0.7
okd-worker-1.mbr.okd.local. IN  A   10.3.0.8
okd-worker-2.mbr.okd.local. IN  A   10.3.0.9

; Openshift internal cluster IPs - "A" records.
api.mbr.okd.local.              IN  A   10.3.0.14
api-int.mbr.okd.local.          IN  A   10.3.0.14
*.apps.mbr.okd.local.           IN  A   10.3.0.14
etcd-0.mbr.okd.local.           IN  A   10.3.0.5
etcd-1.mbr.okd.local.           IN  A   10.3.0.6
etcd-2.mbr.okd.local.           IN  A   10.3.0.7
cons-okd.apps.mbr.okd.local.    IN  A   10.3.0.14
oauth-okd.apps.mbr.okd.local.   IN  A   10.3.0.14

; OpenShift internal cluster IPs - "SRV" records.
_etcd-server-ssl._tcp.mbr.okd.local.    86400   IN  SRV 0   10  2380    etcd-0.mbr
_etcd-server-ssl._tcp.mbr.okd.local.    86400   IN  SRV 0   10  2380    etcd-1.mbr
_etcd-server-ssl._tcp.mbr.okd.local.    86400   IN  SRV 0   10  2380    etcd-2.mbr

. named.conf.local

zone "okd.local" {
    type master;
    file "/etc/named/zones/db.okd.local"; // Zone file path.
};

zone "0.3.10.in-addr.arpa" {
    type master;
    file "/etc/named/zones/db.10.3.0"; // 10.3.0.0/24 subnet.
};

. named.conf

//
// named.conf
//
// Provided by Red Hat bind package to configure the ISC BIND named(8) DNS server
// as a caching only nameserver (as a localhost DNS resolver only).
//
// See /usr/share/doc/bind*/sample/ for example named configuration files.
//
// See the BIND Administrator's Reference Manual (ARM) for details about the configuration
// located in /usr/share/doc/bind-{version}/Bv9ARM.html .

options {
    listen-on port 53 { 127.0.0.1; 10.3.0.14; };
    directory "/var/named";
    dump-file "/var/named/data/cache_dump.db";
    statistics-file "/var/named/data/named_stats.txt";
    memstatistics-file "/var/named/data/named_mem_stats.txt";
    recursing-file "/var/named/data/named.recursing";
    secroots-file "/var/named/data/named.secroots";
    allow-query { localhost; 10.3.0.0/24; };

    // - If you are building an AUTHORITATIVE DNS server, do NOT enable recursion.
    // - If you are building a RECURSIVE (caching) DNS server, you need to enable
    // recursion.
    // - If your recursive DNS server has a public IP address, you MUST enable access
    // control to limit queries to your legitimate users. Failing to do so will cause
    // your server to become part of large scale DNS amplification attacks. Implementing
    // BCP38 within your network would greatly reduce such attack surface.
    recursion yes;

    forwarders {
        8.8.8.8;
        8.8.4.4;
    };

    dnssec-enable yes;
    dnssec-validation yes;

    // Path to ISC DLV key.
    bindkeys-file "/etc/named.root.key";

    managed-keys-directory "/var/named/dynamic";

    pid-file "/run/named/named.pid";
    session-keyfile "/run/named/session.key";
};

logging {
    channel default_debug {
        file "data/named.run";
        severity dynamic;
    };
};

zone "." IN {
    type hint;
    file "named.ca";
};

include "/etc/named.rfc1912.zones";
include "/etc/named.root.key";
include "/etc/named/named.conf.local";

HAProxy (load balancer):

. haproxy.cfg

#---------------------------------------------
# Global settings.
#---------------------------------------------
global
    maxconn 20000
    log /dev/log local0 info
    chroot /var/lib/haproxy
    pidfile /var/run/haproxy.pid
    user haproxy
    group haproxy
    daemon

    # Turn on stats unix socket.
    stats socket /var/lib/haproxy/stats

#---------------------------------------------
# Common defaults that all the "listen" and "backend" sections will use if not designated
# in their block.
#---------------------------------------------
defaults
    mode http
    log global
    option httplog
    option dontlognull
    option http-server-close
    option redispatch
    retries 3
    timeout http-request 10s
    timeout queue 1m
    timeout connect 10s
    timeout client 300s
    timeout server 300s
    timeout http-keep-alive 10s
    timeout check 10s
    maxconn 20000

listen stats
    bind :9000
    mode http
    option forwardfor except 127.0.0.0/8
    stats enable
    stats uri /

frontend okd_k8s_api_fe
    bind :6443
    default_backend okd_k8s_api_be
    mode tcp
    option tcplog

backend okd_k8s_api_be
    balance source
    mode tcp
    server okd-boostrap 10.3.0.4:6443 check
    server okd-master-1 10.3.0.5:6443 check
    server okd-master-2 10.3.0.6:6443 check
    server okd-master-3 10.3.0.7:6443 check

frontend okd_machine_config_server_fe
    bind :22623
    default_backend okd_machine_config_server_be
    mode tcp
    option tcplog

backend okd_machine_config_server_be
    balance source
    mode tcp
    server okd-boostrap 10.3.0.4:22623 check
    server okd-master-1 10.3.0.5:22623 check
    server okd-master-2 10.3.0.6:22623 check
    server okd-master-3 10.3.0.7:22623 check

frontend okd_http_ingress_traffic_fe
    bind :80
    default_backend okd_http_ingress_traffic_be
    mode tcp
    option tcplog

backend okd_http_ingress_traffic_be
    balance source
    mode tcp
    server okd-worker-1 10.3.0.8:80 check
    server okd-worker-2 10.3.0.9:80 check

frontend okd_https_ingress_traffic_fe
    bind *:443
    default_backend okd_https_ingress_traffic_be
    mode tcp
    option tcplog

backend okd_https_ingress_traffic_be
    balance source
    mode tcp
    server okd-worker-1 10.3.0.8:443 check
    server okd-worker-2 10.3.0.9:443 check

OpenShift (OKD) "*.yaml" files:

. htpasswd_provider.yaml

apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
  name: cluster
spec:
  identityProviders:
  - name: htpasswd_provider
    mappingMethod: claim
    type: HTPasswd
    htpasswd:
      fileData:
        name: htpass-secret

. install-config.yaml

apiVersion: v1
baseDomain: okd.local
metadata:
  name: mbr

compute:
- hyperthreading: Enabled
  name: worker
  replicas: 0

controlPlane:
  hyperthreading: Enabled
  name: master
  replicas: 3

networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16

platform:
  none: {}

fips: false

pullSecret: '{"auths":{"fake":{"auth": "bar"}}}'
sshKey: 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAA<SKIPPED>QbAKPwwhdCkTpd8= root@okd_services.my_domain.com.br'

. registry_pv.yaml

apiVersion: v1
kind: PersistentVolume
metadata:
  name: registry-pv
spec:
  capacity:
    storage: 45Gi
  accessModes:
    - ReadWriteMany
  persistentVolumeReclaimPolicy: Retain
  nfs:
    path: /var/nfsshare/registry
    server: 10.3.0.14

UPDATE:

. netstat -natup output...

[root@okd_services ~]# netstat -natup
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      906/sshd            
tcp        0      0 127.0.0.1:953           0.0.0.0:*               LISTEN      929/named           
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 0.0.0.0:22623           0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 0.0.0.0:9000            0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 0.0.0.0:6443            0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 0.0.0.0:111             0.0.0.0:*               LISTEN      1/systemd           
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      4572/haproxy        
tcp        0      0 192.168.122.1:53        0.0.0.0:*               LISTEN      1742/dnsmasq        
tcp        0      0 10.3.0.14:53            0.0.0.0:*               LISTEN      929/named           
tcp        0      0 127.0.0.1:53            0.0.0.0:*               LISTEN      929/named           
tcp        0      0 10.2.0.18:22            10.2.0.3:44536          ESTABLISHED 1854/sshd: root [pr 
tcp        0      0 10.3.0.14:52252         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:52134         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      1 10.3.0.14:42222         10.3.0.8:443            SYN_SENT    4572/haproxy        
tcp        0      0 10.3.0.14:6443          10.3.0.6:51962          ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:52130         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:6443          10.3.0.6:51946          ESTABLISHED 4572/haproxy        
tcp        0      1 10.3.0.14:40530         10.3.0.9:443            SYN_SENT    4572/haproxy        
tcp        0    196 10.2.0.18:22            10.2.0.3:44538          ESTABLISHED 5000/sshd: root [pr 
tcp        0      0 10.2.0.18:45472         10.2.0.5:389            ESTABLISHED 878/sssd_be         
tcp        0      0 10.3.0.14:51970         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:54056         10.3.0.4:6443           ESTABLISHED 4572/haproxy        
tcp        0      0 10.2.0.18:33328         147.75.69.225:80        TIME_WAIT   -                   
tcp        0      0 10.3.0.14:6443          10.3.0.5:39976          ESTABLISHED 4572/haproxy        
tcp        0      0 10.3.0.14:6443          10.3.0.5:52462          ESTABLISHED 4572/haproxy        
tcp        0      1 10.3.0.14:41396         10.3.0.7:22623          SYN_SENT    4572/haproxy        
tcp        0      1 10.3.0.14:41964         10.3.0.9:80             SYN_SENT    4572/haproxy        
tcp        0      1 10.3.0.14:60674         10.3.0.7:6443           SYN_SENT    4572/haproxy        
tcp        0      0 10.3.0.14:6443          10.3.0.5:40024          ESTABLISHED 4572/haproxy        
tcp        0      0 10.2.0.18:43394         109.205.222.4:80        TIME_WAIT   -                   
tcp6       0      0 :::22                   :::*                    LISTEN      906/sshd            
tcp6       0      0 ::1:953                 :::*                    LISTEN      929/named           
tcp6       0      0 :::111                  :::*                    LISTEN      1/systemd           
tcp6       0      0 :::8080                 :::*                    LISTEN      1131/httpd          
tcp6       0      0 :::53                   :::*                    LISTEN      929/named           
udp        0      0 192.168.122.1:53        0.0.0.0:*                           1742/dnsmasq        
udp        0      0 10.3.0.14:53            0.0.0.0:*                           929/named           
udp        0      0 127.0.0.1:53            0.0.0.0:*                           929/named           
udp        0      0 0.0.0.0:67              0.0.0.0:*                           1742/dnsmasq        
udp        0      0 10.3.0.14:68            10.3.0.2:67             ESTABLISHED 893/NetworkManager  
udp        0      0 10.2.0.18:68            10.2.0.2:67             ESTABLISHED 893/NetworkManager  
udp        0      0 0.0.0.0:111             0.0.0.0:*                           1/systemd           
udp        0      0 127.0.0.1:323           0.0.0.0:*                           857/chronyd         
udp6       0      0 :::53                   :::*                                929/named           
udp6       0      0 :::111                  :::*                                1/systemd           
udp6       0      0 ::1:323                 :::*                                857/chronyd

Thanks! =D

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.