I am configuring etcd to bootstrap using DNS discovery but it says that the server is misconfigured and it appears to be querying the wrong port, and the SRV records don't seem right.
Please could you review the below and see my questions at the bottom of this post?
Specifications
root domain: etcd.ksone
server SRV record:
_etcd-server-ssl._tcp.etcd.ksone SRV Simple -
0 0 2380 etcd2.ksone
0 0 2380 etcd1.ksone
client SRV record:
_etcd-client-ssl._tcp.etcd.ksone SRV Simple -
0 0 2379 etcd2.ksone
0 0 2379 etcd1.ksone
using TLS: True
OS:
[fedora@ip-10-0-0-245 ~]$ uname
Linux
[fedora@ip-10-0-0-245 ~]$ cat /etc/os-release
NAME=Fedora
VERSION="24 (Twenty Four)"
ID=fedora
VERSION_ID=24
PRETTY_NAME="Fedora 24 (Twenty Four)"
ANSI_COLOR="0;34"
CPE_NAME="cpe:/o:fedoraproject:fedora:24"
HOME_URL="https://fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=24
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=24
PRIVACY_POLICY_URL=https://fedoraproject.org/wiki/Legal:PrivacyPolicy
etcdctl version
[fedora@ip-10-0-0-245 ~]$ etcdctl --version
etcdctl version 2.2.5
List Members
I try to list members using the command below and get the following error:
bash-4.3# etcdctl --ca-file /etc/etcd/ca.pem --key-file /etc/etcd/kubernetes-key.pem --cert-file /etc/etcd/kubernetes.pem --discovery-srv etcd.ksone --debug member list
start to sync cluster using endpoints(https://etcd1.ksone.:2380,https://etcd2.ksone.:2380)
cURL Command: curl -X GET https://etcd1.ksone.:2380/v2/members
got endpoints() after sync
Cluster-Endpoints:
cURL Command: curl -X GET /v2/members
client: etcd cluster is unavailable or misconfigured
Cluster Health
Similarly, I can query for cluster-health with the following command and output:
bash-4.3# etcdctl --ca-file /etc/etcd/ca.pem --key-file /etc/etcd/kubernetes-key.pem --cert-file /etc/etcd/kubernetes.pem --discovery-srv etcd.ksone --debug cluster-health
Cluster-Endpoints: https://etcd1.ksone.:2380, https://etcd2.ksone.:2380
===> NOTE: IT TRIES (INCORRECTLY?) ON PORT 2380 (SERVER) cURL Command: curl -X GET https://etcd1.ksone.:2380/v2/members
cluster may be unhealthy: failed to list members
Error: unexpected status code 404
SRV records
I have configured the SRV records as follows
- list SRV for root domain i.e. "etcd.ksone" (expected result --> should show the full SRV records but returns nothing?):
dig +noall +answer SRV etcd.ksone
==> <the console shows no output - empty!>
- list SRV explicitly for server:
# dig +noall +answer SRV _etcd-server-ssl._tcp.etcd.ksone
_etcd-server-ssl._tcp.etcd.ksone. 33 IN SRV 0 0 2380 etcd2.ksone.
_etcd-server-ssl._tcp.etcd.ksone. 33 IN SRV 0 0 2380 etcd1.ksone.
- list SRV explicitly for client:
/ # dig +noall +answer SRV _etcd-client-ssl._tcp.etcd.ksone
_etcd-client-ssl._tcp.etcd.ksone. 300 IN SRV 0 0 2379 etcd1.ksone.
_etcd-client-ssl._tcp.etcd.ksone. 300 IN SRV 0 0 2379 etcd2.ksone.
Try the Client endpoint explicitly (SUCCESS, but not really using dns discovery!)
bash-4.3# etcdctl --ca-file /etc/etcd/ca.pem --key-file /etc/etcd/kubernetes-key.pem --cert-file /etc/etcd/kubernetes.pem --debug --endpoint https://etcd1.ksone:2379 cluster-health
Cluster-Endpoints: https://etcd1.ksone:2379
cURL Command: curl -X GET https://etcd1.ksone:2379/v2/members
member 499073e22ac73562 is healthy: got healthy result from https://etcd1.ksone:2379
member b98d4fc780a787fe is healthy: got healthy result from https://etcd2.ksone:2379
cluster is healthy
etcd Service Setup
systemctl status etcd
_ etcd.service - etcd
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: active (running) since Sun 2021-08-01 07:17:39 UTC; 1h 23min ago
Docs: https://github.com/coreos
Main PID: 2363 (etcd)
Tasks: 7 (limit: 512)
CGroup: /system.slice/etcd.service
__2363 /usr/bin/etcd --name etcd1.ksone --discovery-srv=etcd.ksone --initial-advertise-peer-urls https://etcd1.ksone:2380 --initial-cluster-token etcd-cluster-0 --initial-cluster-state new --advertise-client-urls https://etcd1.ksone:2379 --listen-client-urls https://etcd1.ksone:2379,http://127.0.0.1:2379 --listen-peer-urls https://etcd1.ksone:2380 --data-dir=/var/lib/etcd/data --cert-file=/etc/etcd/kubernetes.pem --key-file=/etc/etcd/kubernetes-key.pem --peer-cert-file=/etc/etcd/kubernetes.pem --peer-key-file=/etc/etcd/kubernetes-key.pem --trusted-ca-file=/etc/etcd/ca.pem --peer-trusted-ca-file=/etc/etcd/ca.pem
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: 499073e22ac73562 became candidate at term 41
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: 499073e22ac73562 received vote from 499073e22ac73562 at term 41
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: 499073e22ac73562 [logterm: 1, index: 2] sent vote request to b98d4fc780a787fe at term 41
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: 499073e22ac73562 received vote from b98d4fc780a787fe at term 41
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: 499073e22ac73562 [q:2] has received 2 votes and 0 vote rejections
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: 499073e22ac73562 became leader at term 41
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: raft.node: 499073e22ac73562 elected leader 499073e22ac73562 at term 41
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: published {Name:etcd1.ksone ClientURLs:[https://etcd1.ksone:2379]} to cluster 1c370848b4697ef2
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: setting up the initial cluster version to 2.2
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: set the initial cluster version to 2.2
Summary of Observations
- The discovery mechanism on the etcd client does not appear to be working, as evidenced by the error above i.e.
cluster is unavailable or misconfigured
or ``Error: unexpected status code 404```.
- The debug logs seem to indicate that it is trying to connect to the peer port i.e. 2380 instead of the client port i.e. 2379.
- I can get it to work only by explicitly setting the endpoint switch to port 2379
- The SRV query on the root domain does not appear to be working correctly i.e. it returns a blank result (no output)
systemctl status etcd
seems to indicate that the endpoint have been configured correctly for the etcd startup command.
Questions
- How do I query the records correctly, and what might be the problems (if any) with the dns SRV configuration?
- Why is the etcdctl
--discovery-srv
switch not working - I expect it to discover the correct port i.e. 2379 and not to report any errors.
- Is etcd supposed to be load balanced? Is there a single endpoint that I can query? [Why] is it up to the client to choose an endpoint? Should I configure a load balancer on top of my etcd rig?
Many Thanks!