Score:0

Migrate Zookeeper from VMs to kubernetes pods w/o downtime

cn flag

I'm trying to migrate a 3 node zookeeper ensemble from VMs to a kubernetes cluster without downtime.

I know there are a lot of blog posts and other articles on how to migrate zookeeper without downtime VMs to VMs to bare mettal to Vms etc. but couldn't find one which migrates w/o downtime to k8s.

This is the config on all zk nodes (zoo.cfg):

autopurge.purgeInterval=1
initLimit=10
syncLimit=5
autopurge.snapRetainCount=5
snapCount=5000
4lw.commands.whitelist=*
tickTime=2000
dataDir=/var/opt/zookeeper/data/data
admin.serverPort=8080
reconfigEnabled=true
admin.enableServer=True
standaloneEnabled=false
dynamicConfigFile=/opt/zookeeper/apache-zookeeper-3.7.1-bin/conf/zoo.cfg.dynamic

and /opt/zookeeper/current/conf/zoo.cfg.dynamic

server.1=inzzk01:2888:3888;2181
server.2=inzzk02:2888:3888;2181
server.3=inzzk03:2888:3888;2181

Up until here all is good, the cluster is formed

I run zk in k8s as a statefulset from this answer (btw, by itself if I create a 3 pod cluster it works as expected), so scrap everything on k8s to work on a clean cluster and add the below to the config on VMs + restart each node:

server.4=10.100.102.106:30888:31888;30181
server.5=10.100.102.232:30889:31889;30182

The 2 IP addresses above are correct k8s nodes IP addresses (also the ports are correct) In the logs all is normal:

2023-02-17 13:45:30,107 [myid:1] - WARN  [QuorumConnectionThread-[myid=1]-3:QuorumCnxManager@401] - Cannot open channel to 4 at election address /10.100.102.106:31888
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:607)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
2023-02-17 13:45:30,113 [myid:1] - WARN  [QuorumConnectionThread-[myid=1]-4:QuorumCnxManager@401] - Cannot open channel to 5 at election address /10.100.102.232:31889
java.net.ConnectException: Connection refused (Connection refused)
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
        at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
        at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:607)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:384)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:458)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

I tried all types of services (ClusterIP, headless and not, Loadbalancer and NodePort) in the end I figured the simple way to go is no service + add hostNetwork: true to the statefulset. This way the ports are directly mapped to the k8s nodes so no proxy/SNAT/DNAT/xNAT :) so I can target them directly. Again not recommended! but for the sake of this example.

kubectl -n infraservices get all
NAME                              READY   STATUS    RESTARTS      AGE
pod/zk-0                          0/1     Running   1 (65s ago)   2m45s

In the logs of the pod:

2023-02-17 14:00:48,308 [myid:4] - INFO  [WorkerReceiver[myid=4]:FastLeaderElection$Messenger$WorkerReceiver@390] - Notification: my state:LOOKING; n.sid:4, n.state:LOOKING, n.leader:4, n.round:0x1, n.peerEpoch:0x0, n.zxid:0x0, message format version:0x2, n.config version:0x0
2023-02-17 14:00:48,315 [myid:4] - INFO  [WorkerReceiver[myid=4]:FastLeaderElection$Messenger$WorkerReceiver@308] - 4 Received version: 1600000000 my version: 0
2023-02-17 14:00:48,315 [myid:4] - INFO  [WorkerReceiver[myid=4]:FastLeaderElection$Messenger$WorkerReceiver@316] - restarting leader election
2023-02-17 14:00:48,315 [myid:4] - WARN  [RecvWorker:2:QuorumCnxManager$RecvWorker@1408] - Interrupting SendWorker thread from RecvWorker. sid: 2. myId: 4
2023-02-17 14:00:48,393 [myid:4] - WARN  [RecvWorker:3:QuorumCnxManager$RecvWorker@1408] - Interrupting SendWorker thread from RecvWorker. sid: 3. myId: 4
2023-02-17 14:00:48,394 [myid:4] - INFO  [QuorumPeerListener:QuorumCnxManager$Listener@985] - Leaving listener
2023-02-17 14:00:48,395 [myid:4] - WARN  [SendWorker:2:QuorumCnxManager$SendWorker@1288] - Interrupted while waiting for message on queue
java.lang.InterruptedException
        at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
        at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
        at org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1453)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:99)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1277)
2023-02-17 14:00:48,395 [myid:4] - WARN  [RecvWorker:1:QuorumCnxManager$RecvWorker@1402] - Connection broken for id 1, my id = 4
java.net.SocketException: Socket closed
        at java.base/java.net.SocketInputStream.socketRead0(Native Method)
        at java.base/java.net.SocketInputStream.socketRead(Unknown Source)
        at java.base/java.net.SocketInputStream.read(Unknown Source)
        at java.base/java.net.SocketInputStream.read(Unknown Source)
        at java.base/java.io.BufferedInputStream.fill(Unknown Source)
        at java.base/java.io.BufferedInputStream.read(Unknown Source)
        at java.base/java.io.DataInputStream.readInt(Unknown Source)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1390)
2023-02-17 14:00:48,395 [myid:4] - WARN  [SendWorker:3:QuorumCnxManager$SendWorker@1288] - Interrupted while waiting for message on queue
java.lang.InterruptedException
        at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(Unknown Source)
        at java.base/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(Unknown Source)
        at org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1453)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:99)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1277)
2023-02-17 14:00:48,395 [myid:4] - INFO  [WorkerReceiver[myid=4]:FastLeaderElection$Messenger$WorkerReceiver@472] - WorkerReceiver is down
2023-02-17 14:00:48,395 [myid:4] - WARN  [SendWorker:1:QuorumCnxManager$SendWorker@1288] - Interrupted while waiting for message on queue
java.lang.InterruptedException

On the VM the logs look like this:

2023-02-17 14:03:42,165 [myid:1] - INFO  [ListenerHandler-inzzk01/10.100.100.128:3888:QuorumCnxManager$Listener$ListenerHandler@1076] - Received connection request from /10.100.102.106:51674
2023-02-17 14:03:42,167 [myid:1] - WARN  [RecvWorker:4:QuorumCnxManager$RecvWorker@1402] - Connection broken for id 4, my id = 1
java.io.EOFException
        at java.io.DataInputStream.readInt(DataInputStream.java:392)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:1390)
2023-02-17 14:03:42,167 [myid:1] - WARN  [RecvWorker:4:QuorumCnxManager$RecvWorker@1408] - Interrupting SendWorker thread from RecvWorker. sid: 4. myId: 1
2023-02-17 14:03:42,167 [myid:1] - WARN  [SendWorker:4:QuorumCnxManager$SendWorker@1288] - Interrupted while waiting for message on queue
java.lang.InterruptedException
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:2014)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2088)
        at org.apache.zookeeper.util.CircularBlockingQueue.poll(CircularBlockingQueue.java:105)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.pollSendQueue(QuorumCnxManager.java:1453)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.access$900(QuorumCnxManager.java:99)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:1277)
2023-02-17 14:03:42,167 [myid:1] - WARN  [SendWorker:4:QuorumCnxManager$SendWorker@1300] - Send worker leaving thread id 4 my id = 1

I am able to connect from the pod to the cluster with zkCli.sh

root@inccl02az12-23rpb-mvnnw:/apache-zookeeper-3.7.1-bin# bin/zkCli.sh -timeout 3000 -server inzzk01:2181

[zk: inzzk01:2181(CONNECTED) 6] get /zookeeper/config
server.1=inzzk01:2888:3888:participant;0.0.0.0:2181
server.2=inzzk02:2888:3888:participant;0.0.0.0:2181
server.3=inzzk03:2888:3888:participant;0.0.0.0:2181
version=1600000000
[zk: inzzk01:2181(CONNECTED) 7]

So how can I connect at least one zookeeper node as a pod in k8s to an existing cluster outside k8s ?

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.