Simple question, but so far very difficult to answer... =-[
I am trying to deploy OpenShift (OKD) 4.5 or 4.7 as directed here Guide: Installing an OKD 4.5 Cluster. Look at the "Starting the control plane nodes" section.
I'm trying to create the cluster using an UPI (User Provisioned Infrastructure)/Bare Metal (KVM).
PROBLEM:
The master node cannot finish installation after reboot. It keeps showing the following error...
[ 1304.254380] ignition[485]: GET https://api-int.mbr.okd.local:22623/config/master: attempt #92
[ 1314.264629] ignition[485]: GET error: Get "https://api-int.mbr.okd.local:22623/config/master": net/http: timeout awaiting response headers
For version 4.5 we use "Fedora CoreOS 32.20200715.3.0".
The master node cannot finish installation after reboot. It keeps showing the following error...
[ 543.933709] ignition[505]: GET https://api-int.mbr.okd.local:22623/config/master: attempt #112
[ 543.939340] ignition[505]: GET error: Get "https://api-int.mbr.okd.loca1:22623/config/master": EOF
For version 4.7 we use "Fedora CoreOS 34.20210518.3.0".
I've waited hours and hours and the master nodes are still in the same situation. What can I do to resolve this?
Thanks! =D
MORE INFORMATION:
See if this helps...
This output occurs in okd_master_3 (10.3.0.7)....
[ 1304.254380] ignition[485]: GET https://api-int.mbr.okd.local:22623/config/master: attempt #92
[ 1314.264629] ignition[485]: GET error: Get "https://api-int.mbr.okd.local:22623/config/master": net/http: timeout awaiting response headers
Connecting okd_master_2 (10.3.0.6) from okd_services (10.3.0.14)...
NOTE: The okd_master_2 (10.3.0.6) was able to boot (reached login screen).
[root@okd_services ~]# ssh [email protected]
The authenticity of host '10.3.0.6 (10.3.0.6)' can't be established.
ECDSA key fingerprint is SHA256:1xdq65g0ljnZYR6uXHaXW6EsxO3u6X268s4Z9Kfq0ng.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '10.3.0.6' (ECDSA) to the list of known hosts.
Fedora CoreOS 32.20200629.3.0
Tracker: https://github.com/coreos/fedora-coreos-tracker
Discuss: https://discussion.fedoraproject.org/c/server/coreos/
Pinging the okd_bootstrap (10.3.0.4) from okd_master_2 (10.3.0.6)...
[core@localhost ~]$ ping 10.3.0.4
PING 10.3.0.4 (10.3.0.4) 56(84) bytes of data.
64 bytes from 10.3.0.4: icmp_seq=1 ttl=64 time=0.973 ms
64 bytes from 10.3.0.4: icmp_seq=2 ttl=64 time=0.801 ms
64 bytes from 10.3.0.4: icmp_seq=3 ttl=64 time=0.373 ms
64 bytes from 10.3.0.4: icmp_seq=4 ttl=64 time=0.647 ms
^C
--- 10.3.0.4 ping statistics ---
4 packets transmitted, 4 received, 0% packet loss, time 3032ms
rtt min/avg/max/mdev = 0.373/0.698/0.973/0.220 ms
Calling the problematic URL from okd_master_2 (10.3.0.6)...
[core@localhost ~]$ curl -kv https://api-int.mbr.okd.local:22623/config/master
* Trying 10.3.0.14:22623...
* Connected to api-int.mbr.okd.local (10.3.0.14) port 22623 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /etc/pki/tls/certs/ca-bundle.crt
CApath: none
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_128_GCM_SHA256
* ALPN, server accepted to use h2
* Server certificate:
* subject: CN=api-int.mbr.okd.local
* start date: Jun 16 23:52:22 2021 GMT
* expire date: Jun 14 23:52:23 2031 GMT
* issuer: OU=openshift; CN=root-ca
* SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x561ed249aa40)
> GET /config/master HTTP/2
> Host: api-int.mbr.okd.local:22623
> user-agent: curl/7.69.1
> accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 250)!
< HTTP/2 500
< content-length: 0
< date: Thu, 17 Jun 2021 14:55:43 GMT
<
* Connection #0 to host api-int.mbr.okd.local left intact
INFRASTRUCTURE:
Virtual machines...
NAME ROLE OS IP MAC
okd_boostrap bootstrap Fedora CoreOS 10.3.0.4 52:54:00:07:80:62
okd_master_1 master Fedora CoreOS 10.3.0.5 52:54:00:7d:97:70
okd_master_2 master Fedora CoreOS 10.3.0.6 52:54:00:6e:52:85
okd_master_3 master Fedora CoreOS 10.3.0.7 52:54:00:a3:65:d9
okd_worker_1 worker Fedora CoreOS 10.3.0.8 52:54:00:e3:7c:fb
okd_worker_2 worker Fedora CoreOS 10.3.0.9 52:54:00:20:ec:4f
okd_services DNS/LB/web/NFS CentOS 8 10.3.0.14 52:54:00:3a:fd:a2
10.2.0.18 52:54:00:92:ce:78
okd_pfsense firewall/router/DHCP FreeBSD 10.3.0.2 52:54:00:d8:27:82
10.2.0.19 52:54:00:ac:82:7d
. OKD_LAN: "10.3.0";
. EXT_LAN: "10.2.0".
Some acronyms...
_ DNS - Domain Name System;
_ LB - Load Balancing;
_ Web - Web Server;
_ NFS - Network File Sharing.
Network layout...
...→.[N]WAN/EXT_LAN([R]dhcp).←... (10.2.0.0/24)
↓ ↓
[I]WAN/EXT_LAN [I]WAN/EXT_LAN
[V]OKD_PFSENSE([R]dhcp) [V]OKD_SERVICES
[I]OKD_LAN [I]OKD_LAN
↑ ↑
.........→.[N]OKD_LAN.←.......... (10.3.0.0/24)
↑
...................................
↓ ↓ ↓
[V]OKD_BOOSTRAP [V]OKD_MASTER_1 [V]OKD_WORKER_1
[V]OKD_MASTER_2 [V]OKD_WORKER_2
[V]OKD_MASTER_3
_ [N] - Network;
_ [R] - Network Resource;
_ [I] - Network Interface;
_ [V] - Virtual Machine.
CONFIGURATION FILES:
BIND 9 (DNS):
. db.10.3.0
$TTL 604800
@ IN SOA okd-services.okd.local. admin.okd.local. (
6 ; Serial
604800 ; Refresh
86400 ; Retry
2419200 ; Expire
604800 ; Negative Cache TTL
)
; Name servers - "NS" records.
IN NS okd-services.okd.local.
; Name servers - "PTR" records.
14 IN PTR okd-services.okd.local.
; OpenShift container platform cluster - "PTR" records.
4 IN PTR okd-boostrap.mbr.okd.local.
5 IN PTR okd-master-1.mbr.okd.local.
6 IN PTR okd-master-2.mbr.okd.local.
7 IN PTR okd-master-3.mbr.okd.local.
8 IN PTR okd-worker-1.mbr.okd.local.
9 IN PTR okd-worker-2.mbr.okd.local.
14 IN PTR api.mbr.okd.local.
14 IN PTR api-int.mbr.okd.local.
. db.okd.local
$TTL 604800
@ IN SOA okd-services.okd.local. admin.okd.local. (
1 ; Serial
604800 ; Refresh
86400 ; Retry
2419200 ; Expire
604800 ; Negative Cache TTL
)
; Name servers - "NS" records.
IN NS okd-services
; Name servers - "A" records.
okd-services.okd.local. IN A 10.3.0.14
; OpenShift container platform cluster - "A" records.
okd-boostrap.mbr.okd.local. IN A 10.3.0.4
okd-master-1.mbr.okd.local. IN A 10.3.0.5
okd-master-2.mbr.okd.local. IN A 10.3.0.6
okd-master-3.mbr.okd.local. IN A 10.3.0.7
okd-worker-1.mbr.okd.local. IN A 10.3.0.8
okd-worker-2.mbr.okd.local. IN A 10.3.0.9
; Openshift internal cluster IPs - "A" records.
api.mbr.okd.local. IN A 10.3.0.14
api-int.mbr.okd.local. IN A 10.3.0.14
*.apps.mbr.okd.local. IN A 10.3.0.14
etcd-0.mbr.okd.local. IN A 10.3.0.5
etcd-1.mbr.okd.local. IN A 10.3.0.6
etcd-2.mbr.okd.local. IN A 10.3.0.7
cons-okd.apps.mbr.okd.local. IN A 10.3.0.14
oauth-okd.apps.mbr.okd.local. IN A 10.3.0.14
; OpenShift internal cluster IPs - "SRV" records.
_etcd-server-ssl._tcp.mbr.okd.local. 86400 IN SRV 0 10 2380 etcd-0.mbr
_etcd-server-ssl._tcp.mbr.okd.local. 86400 IN SRV 0 10 2380 etcd-1.mbr
_etcd-server-ssl._tcp.mbr.okd.local. 86400 IN SRV 0 10 2380 etcd-2.mbr
. named.conf.local
zone "okd.local" {
type master;
file "/etc/named/zones/db.okd.local"; // Zone file path.
};
zone "0.3.10.in-addr.arpa" {
type master;
file "/etc/named/zones/db.10.3.0"; // 10.3.0.0/24 subnet.
};
. named.conf
//
// named.conf
//
// Provided by Red Hat bind package to configure the ISC BIND named(8) DNS server
// as a caching only nameserver (as a localhost DNS resolver only).
//
// See /usr/share/doc/bind*/sample/ for example named configuration files.
//
// See the BIND Administrator's Reference Manual (ARM) for details about the configuration
// located in /usr/share/doc/bind-{version}/Bv9ARM.html .
options {
listen-on port 53 { 127.0.0.1; 10.3.0.14; };
directory "/var/named";
dump-file "/var/named/data/cache_dump.db";
statistics-file "/var/named/data/named_stats.txt";
memstatistics-file "/var/named/data/named_mem_stats.txt";
recursing-file "/var/named/data/named.recursing";
secroots-file "/var/named/data/named.secroots";
allow-query { localhost; 10.3.0.0/24; };
// - If you are building an AUTHORITATIVE DNS server, do NOT enable recursion.
// - If you are building a RECURSIVE (caching) DNS server, you need to enable
// recursion.
// - If your recursive DNS server has a public IP address, you MUST enable access
// control to limit queries to your legitimate users. Failing to do so will cause
// your server to become part of large scale DNS amplification attacks. Implementing
// BCP38 within your network would greatly reduce such attack surface.
recursion yes;
forwarders {
8.8.8.8;
8.8.4.4;
};
dnssec-enable yes;
dnssec-validation yes;
// Path to ISC DLV key.
bindkeys-file "/etc/named.root.key";
managed-keys-directory "/var/named/dynamic";
pid-file "/run/named/named.pid";
session-keyfile "/run/named/session.key";
};
logging {
channel default_debug {
file "data/named.run";
severity dynamic;
};
};
zone "." IN {
type hint;
file "named.ca";
};
include "/etc/named.rfc1912.zones";
include "/etc/named.root.key";
include "/etc/named/named.conf.local";
HAProxy (load balancer):
. haproxy.cfg
#---------------------------------------------
# Global settings.
#---------------------------------------------
global
maxconn 20000
log /dev/log local0 info
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
user haproxy
group haproxy
daemon
# Turn on stats unix socket.
stats socket /var/lib/haproxy/stats
#---------------------------------------------
# Common defaults that all the "listen" and "backend" sections will use if not designated
# in their block.
#---------------------------------------------
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 300s
timeout server 300s
timeout http-keep-alive 10s
timeout check 10s
maxconn 20000
listen stats
bind :9000
mode http
option forwardfor except 127.0.0.0/8
stats enable
stats uri /
frontend okd_k8s_api_fe
bind :6443
default_backend okd_k8s_api_be
mode tcp
option tcplog
backend okd_k8s_api_be
balance source
mode tcp
server okd-boostrap 10.3.0.4:6443 check
server okd-master-1 10.3.0.5:6443 check
server okd-master-2 10.3.0.6:6443 check
server okd-master-3 10.3.0.7:6443 check
frontend okd_machine_config_server_fe
bind :22623
default_backend okd_machine_config_server_be
mode tcp
option tcplog
backend okd_machine_config_server_be
balance source
mode tcp
server okd-boostrap 10.3.0.4:22623 check
server okd-master-1 10.3.0.5:22623 check
server okd-master-2 10.3.0.6:22623 check
server okd-master-3 10.3.0.7:22623 check
frontend okd_http_ingress_traffic_fe
bind :80
default_backend okd_http_ingress_traffic_be
mode tcp
option tcplog
backend okd_http_ingress_traffic_be
balance source
mode tcp
server okd-worker-1 10.3.0.8:80 check
server okd-worker-2 10.3.0.9:80 check
frontend okd_https_ingress_traffic_fe
bind *:443
default_backend okd_https_ingress_traffic_be
mode tcp
option tcplog
backend okd_https_ingress_traffic_be
balance source
mode tcp
server okd-worker-1 10.3.0.8:443 check
server okd-worker-2 10.3.0.9:443 check
OpenShift (OKD) "*.yaml" files:
. htpasswd_provider.yaml
apiVersion: config.openshift.io/v1
kind: OAuth
metadata:
name: cluster
spec:
identityProviders:
- name: htpasswd_provider
mappingMethod: claim
type: HTPasswd
htpasswd:
fileData:
name: htpass-secret
. install-config.yaml
apiVersion: v1
baseDomain: okd.local
metadata:
name: mbr
compute:
- hyperthreading: Enabled
name: worker
replicas: 0
controlPlane:
hyperthreading: Enabled
name: master
replicas: 3
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
none: {}
fips: false
pullSecret: '{"auths":{"fake":{"auth": "bar"}}}'
sshKey: 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAA<SKIPPED>QbAKPwwhdCkTpd8= root@okd_services.my_domain.com.br'
. registry_pv.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
name: registry-pv
spec:
capacity:
storage: 45Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
nfs:
path: /var/nfsshare/registry
server: 10.3.0.14
UPDATE:
. netstat -natup
output...
[root@okd_services ~]# netstat -natup
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN 906/sshd
tcp 0 0 127.0.0.1:953 0.0.0.0:* LISTEN 929/named
tcp 0 0 0.0.0.0:443 0.0.0.0:* LISTEN 4572/haproxy
tcp 0 0 0.0.0.0:22623 0.0.0.0:* LISTEN 4572/haproxy
tcp 0 0 0.0.0.0:9000 0.0.0.0:* LISTEN 4572/haproxy
tcp 0 0 0.0.0.0:6443 0.0.0.0:* LISTEN 4572/haproxy
tcp 0 0 0.0.0.0:111 0.0.0.0:* LISTEN 1/systemd
tcp 0 0 0.0.0.0:80 0.0.0.0:* LISTEN 4572/haproxy
tcp 0 0 192.168.122.1:53 0.0.0.0:* LISTEN 1742/dnsmasq
tcp 0 0 10.3.0.14:53 0.0.0.0:* LISTEN 929/named
tcp 0 0 127.0.0.1:53 0.0.0.0:* LISTEN 929/named
tcp 0 0 10.2.0.18:22 10.2.0.3:44536 ESTABLISHED 1854/sshd: root [pr
tcp 0 0 10.3.0.14:52252 10.3.0.4:6443 ESTABLISHED 4572/haproxy
tcp 0 0 10.3.0.14:52134 10.3.0.4:6443 ESTABLISHED 4572/haproxy
tcp 0 1 10.3.0.14:42222 10.3.0.8:443 SYN_SENT 4572/haproxy
tcp 0 0 10.3.0.14:6443 10.3.0.6:51962 ESTABLISHED 4572/haproxy
tcp 0 0 10.3.0.14:52130 10.3.0.4:6443 ESTABLISHED 4572/haproxy
tcp 0 0 10.3.0.14:6443 10.3.0.6:51946 ESTABLISHED 4572/haproxy
tcp 0 1 10.3.0.14:40530 10.3.0.9:443 SYN_SENT 4572/haproxy
tcp 0 196 10.2.0.18:22 10.2.0.3:44538 ESTABLISHED 5000/sshd: root [pr
tcp 0 0 10.2.0.18:45472 10.2.0.5:389 ESTABLISHED 878/sssd_be
tcp 0 0 10.3.0.14:51970 10.3.0.4:6443 ESTABLISHED 4572/haproxy
tcp 0 0 10.3.0.14:54056 10.3.0.4:6443 ESTABLISHED 4572/haproxy
tcp 0 0 10.2.0.18:33328 147.75.69.225:80 TIME_WAIT -
tcp 0 0 10.3.0.14:6443 10.3.0.5:39976 ESTABLISHED 4572/haproxy
tcp 0 0 10.3.0.14:6443 10.3.0.5:52462 ESTABLISHED 4572/haproxy
tcp 0 1 10.3.0.14:41396 10.3.0.7:22623 SYN_SENT 4572/haproxy
tcp 0 1 10.3.0.14:41964 10.3.0.9:80 SYN_SENT 4572/haproxy
tcp 0 1 10.3.0.14:60674 10.3.0.7:6443 SYN_SENT 4572/haproxy
tcp 0 0 10.3.0.14:6443 10.3.0.5:40024 ESTABLISHED 4572/haproxy
tcp 0 0 10.2.0.18:43394 109.205.222.4:80 TIME_WAIT -
tcp6 0 0 :::22 :::* LISTEN 906/sshd
tcp6 0 0 ::1:953 :::* LISTEN 929/named
tcp6 0 0 :::111 :::* LISTEN 1/systemd
tcp6 0 0 :::8080 :::* LISTEN 1131/httpd
tcp6 0 0 :::53 :::* LISTEN 929/named
udp 0 0 192.168.122.1:53 0.0.0.0:* 1742/dnsmasq
udp 0 0 10.3.0.14:53 0.0.0.0:* 929/named
udp 0 0 127.0.0.1:53 0.0.0.0:* 929/named
udp 0 0 0.0.0.0:67 0.0.0.0:* 1742/dnsmasq
udp 0 0 10.3.0.14:68 10.3.0.2:67 ESTABLISHED 893/NetworkManager
udp 0 0 10.2.0.18:68 10.2.0.2:67 ESTABLISHED 893/NetworkManager
udp 0 0 0.0.0.0:111 0.0.0.0:* 1/systemd
udp 0 0 127.0.0.1:323 0.0.0.0:* 857/chronyd
udp6 0 0 :::53 :::* 929/named
udp6 0 0 :::111 :::* 1/systemd
udp6 0 0 ::1:323 :::* 857/chronyd
Thanks! =D