RabbitMQ quorum queues - have node automatically rejoin

matpen

7/13/23, 4:45 PM

I am exploring RabbitMQ quorum queues to improve HA for some services in a Kubernetes cluster. As I am reading, they are designed with data safety in mind.

However, the chapter "Managing Replicas" states:

Replicas of a quorum queue are explicitly managed by the operator. When a new node is added to the cluster, it will host no quorum queue replicas unless the operator explicitly adds it to a member (replica) list of a quorum queue or a set of quorum queues.

It seems therefore that, in case of disruptions (especially involuntary), the following situation could arise (for a 3-nodes cluster):

after a disruption a node would go down: the other two nodes still compose the majority and will "keep the queue alive", possibly electing a new leader;
kubernetes will provide a new node (pod) to replace the failed node; the new node will automatically rejoin the RabbitMQ cluster, but
unless the operator manually intervenes, the new node will not contribute to the existing quorum queues;
for a 3-nodes cluster, this means that there is no HA anymore: if, sometime in the future, one of the other nodes fails, the queue is effectively lost;

Is there any way to mitigate this scenario? Is it, for example, possible to have nodes automatically rejoin all existing quorum queue clusters? Maybe by maintaining a list of "startup commands" (which run after RabbitMQ starts) to which we could add the rejoin commands?

0 + 0

high-availability

rabbitmq

kubernetes

Mikołaj Głodziak

7/14/23, 3:09 PM

Which version of Kubernetes did you use and how did you set up the cluster? Did you use bare metal installation or some cloud provider? It is important to reproduce your problem.

Score:1

Server

Luke Bakken

7/16/23, 3:32 PM

The RabbitMQ team highly recommends the use of the official Kubernetes operator - https://www.rabbitmq.com/kubernetes/operator/operator-overview.html

Aside from that, here's what the local k8s expert has to say:

Kubernetes will not just randomly delete a persistent volume - if the node went down for some reason, it will start with the same name and the same data

As long as the same name and data is used, the "new" node will join just as if it were the old one.

There are probably scenarios that require manual intervention but they aren't as frequent as you'd think.

_{NOTE: the RabbitMQ team monitors the rabbitmq-users mailing list and only sometimes answers questions on StackOverflow.}

0 + 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: RabbitMQ quorum queues - have node automatically rejoin

TH: คิวโควรัม RabbitMQ - ให้โหนดเข้าร่วมใหม่โดยอัตโนมัติ

RO: Cozi de cvorum RabbitMQ - au nodul să se reunească automat

RU: Очереди кворума RabbitMQ - автоматическое повторное присоединение узла

VI: Hàng đợi đại biểu RabbitMQ - có nút tự động tham gia lại

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.