Score:0

etcd cluster with DNS Discovery - client: etcd cluster is unavailable or misconfigured; Error: unexpected status code 404; dig SRV returns blank

cn flag

I am configuring etcd to bootstrap using DNS discovery but it says that the server is misconfigured and it appears to be querying the wrong port, and the SRV records don't seem right.

Please could you review the below and see my questions at the bottom of this post?


Specifications

root domain: etcd.ksone

server SRV record:

_etcd-server-ssl._tcp.etcd.ksone    SRV Simple  -   
0 0 2380 etcd2.ksone
0 0 2380 etcd1.ksone 

client SRV record:

_etcd-client-ssl._tcp.etcd.ksone    SRV Simple  -   
0 0 2379 etcd2.ksone
0 0 2379 etcd1.ksone

using TLS: True

OS:

[fedora@ip-10-0-0-245 ~]$ uname
Linux
[fedora@ip-10-0-0-245 ~]$ cat /etc/os-release
NAME=Fedora
VERSION="24 (Twenty Four)"
ID=fedora
VERSION_ID=24
PRETTY_NAME="Fedora 24 (Twenty Four)"
ANSI_COLOR="0;34"
CPE_NAME="cpe:/o:fedoraproject:fedora:24"
HOME_URL="https://fedoraproject.org/"
BUG_REPORT_URL="https://bugzilla.redhat.com/"
REDHAT_BUGZILLA_PRODUCT="Fedora"
REDHAT_BUGZILLA_PRODUCT_VERSION=24
REDHAT_SUPPORT_PRODUCT="Fedora"
REDHAT_SUPPORT_PRODUCT_VERSION=24
PRIVACY_POLICY_URL=https://fedoraproject.org/wiki/Legal:PrivacyPolicy

etcdctl version

[fedora@ip-10-0-0-245 ~]$ etcdctl --version
etcdctl version 2.2.5

List Members

I try to list members using the command below and get the following error:

bash-4.3# etcdctl --ca-file /etc/etcd/ca.pem --key-file /etc/etcd/kubernetes-key.pem --cert-file /etc/etcd/kubernetes.pem --discovery-srv etcd.ksone --debug member list

start to sync cluster using endpoints(https://etcd1.ksone.:2380,https://etcd2.ksone.:2380)
cURL Command: curl -X GET https://etcd1.ksone.:2380/v2/members
got endpoints() after sync
Cluster-Endpoints:
cURL Command: curl -X GET /v2/members
client: etcd cluster is unavailable or misconfigured

Cluster Health

Similarly, I can query for cluster-health with the following command and output:

bash-4.3# etcdctl --ca-file /etc/etcd/ca.pem --key-file /etc/etcd/kubernetes-key.pem --cert-file /etc/etcd/kubernetes.pem --discovery-srv etcd.ksone --debug cluster-health

Cluster-Endpoints: https://etcd1.ksone.:2380, https://etcd2.ksone.:2380
===> NOTE: IT TRIES (INCORRECTLY?) ON PORT 2380 (SERVER)  cURL Command: curl -X GET https://etcd1.ksone.:2380/v2/members
cluster may be unhealthy: failed to list members
Error:  unexpected status code 404

SRV records

I have configured the SRV records as follows

  • list SRV for root domain i.e. "etcd.ksone" (expected result --> should show the full SRV records but returns nothing?):
 dig +noall +answer SRV etcd.ksone

==> <the console shows no output - empty!>
  • list SRV explicitly for server:
# dig +noall +answer SRV _etcd-server-ssl._tcp.etcd.ksone


_etcd-server-ssl._tcp.etcd.ksone. 33 IN SRV     0 0 2380 etcd2.ksone.
_etcd-server-ssl._tcp.etcd.ksone. 33 IN SRV     0 0 2380 etcd1.ksone.
  • list SRV explicitly for client:
/ # dig +noall +answer SRV _etcd-client-ssl._tcp.etcd.ksone


_etcd-client-ssl._tcp.etcd.ksone. 300 IN SRV    0 0 2379 etcd1.ksone.
_etcd-client-ssl._tcp.etcd.ksone. 300 IN SRV    0 0 2379 etcd2.ksone.

Try the Client endpoint explicitly (SUCCESS, but not really using dns discovery!)

bash-4.3# etcdctl --ca-file /etc/etcd/ca.pem --key-file /etc/etcd/kubernetes-key.pem --cert-file /etc/etcd/kubernetes.pem --debug --endpoint https://etcd1.ksone:2379 cluster-health


Cluster-Endpoints: https://etcd1.ksone:2379
cURL Command: curl -X GET https://etcd1.ksone:2379/v2/members
member 499073e22ac73562 is healthy: got healthy result from https://etcd1.ksone:2379
member b98d4fc780a787fe is healthy: got healthy result from https://etcd2.ksone:2379
cluster is healthy

etcd Service Setup

systemctl status etcd



_ etcd.service - etcd
   Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2021-08-01 07:17:39 UTC; 1h 23min ago
     Docs: https://github.com/coreos
 Main PID: 2363 (etcd)
    Tasks: 7 (limit: 512)
   CGroup: /system.slice/etcd.service
           __2363 /usr/bin/etcd --name etcd1.ksone --discovery-srv=etcd.ksone --initial-advertise-peer-urls https://etcd1.ksone:2380 --initial-cluster-token etcd-cluster-0 --initial-cluster-state new --advertise-client-urls https://etcd1.ksone:2379 --listen-client-urls https://etcd1.ksone:2379,http://127.0.0.1:2379 --listen-peer-urls https://etcd1.ksone:2380 --data-dir=/var/lib/etcd/data --cert-file=/etc/etcd/kubernetes.pem --key-file=/etc/etcd/kubernetes-key.pem --peer-cert-file=/etc/etcd/kubernetes.pem --peer-key-file=/etc/etcd/kubernetes-key.pem --trusted-ca-file=/etc/etcd/ca.pem --peer-trusted-ca-file=/etc/etcd/ca.pem

Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: 499073e22ac73562 became candidate at term 41
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: 499073e22ac73562 received vote from 499073e22ac73562 at term 41
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: 499073e22ac73562 [logterm: 1, index: 2] sent vote request to b98d4fc780a787fe at term 41
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: 499073e22ac73562 received vote from b98d4fc780a787fe at term 41
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: 499073e22ac73562 [q:2] has received 2 votes and 0 vote rejections
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: 499073e22ac73562 became leader at term 41
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: raft.node: 499073e22ac73562 elected leader 499073e22ac73562 at term 41
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: published {Name:etcd1.ksone ClientURLs:[https://etcd1.ksone:2379]} to cluster 1c370848b4697ef2
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: setting up the initial cluster version to 2.2
Aug 01 07:18:32 ip-10-0-0-245.eu-west-1.compute.internal etcd[2363]: set the initial cluster version to 2.2

Summary of Observations

  • The discovery mechanism on the etcd client does not appear to be working, as evidenced by the error above i.e. cluster is unavailable or misconfigured or ``Error: unexpected status code 404```.
  • The debug logs seem to indicate that it is trying to connect to the peer port i.e. 2380 instead of the client port i.e. 2379.
  • I can get it to work only by explicitly setting the endpoint switch to port 2379
  • The SRV query on the root domain does not appear to be working correctly i.e. it returns a blank result (no output)
  • systemctl status etcd seems to indicate that the endpoint have been configured correctly for the etcd startup command.

Questions

  • How do I query the records correctly, and what might be the problems (if any) with the dns SRV configuration?
  • Why is the etcdctl --discovery-srv switch not working - I expect it to discover the correct port i.e. 2379 and not to report any errors.
  • Is etcd supposed to be load balanced? Is there a single endpoint that I can query? [Why] is it up to the client to choose an endpoint? Should I configure a load balancer on top of my etcd rig?

Many Thanks!

Michael Hampton avatar
cz flag
Stop what you're doing and start over with a supported OS distribution.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.