Score:0

Node unable to leave cluster for eJabberd upgrade

in flag

Environment

  • ejabberd version: 20.04
  • Erlang version: Erlang (SMP,ASYNC_THREADS)(BEAM) emulator version 9.2
  • OS: Linux (Debian)
  • Installed from: source

Errors from crash.log

2022-02-08 22:42:45 =CRASH REPORT==== crasher: initial call: pgsql_proto:init/1 pid: <0.27318.6018> registered_name: [] exception exit: {{init,{error,timeout}},[{gen_server,init_it,6,[{file,"gen_server.erl"},{line,349}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]} ancestors: ['ejabberd_sql_vhost1.xmpp_12','ejabberd_sql_sup_vhost1.xmpp',ejabberd_db_sup,ejabberd_sup,<0.87.0>] message_queue_len: 0 messages: [] links: [] dictionary: [] trap_exit: false status: running heap_size: 376 stack_size: 27 reductions: 997 neighbours:

Bug description I am trying to upgrade from eJabberd 20.04 to 20.07. My cluster setup has three nodes. The rolling upgrade on two nodes were successful. When node1 is trying to leave cluster for upgrade, it gives the following error:

Failed RPC connection to the node '[email protected]: timeout

When I try ejabberdctl status, the following was returned: The node '[email protected]' is started with status: started Failed RPC connection to the node '[email protected]': {'EXIT', {timeout, {gen_server,call, [application_controller, which_applications]}}}

On Erlang shell, the node is still shown part of the cluster

nodes(). ['[email protected]','[email protected]']

Could you please help me in resolving this issue.

Badlop avatar
ru flag
This same question was silently cross-posted in https://github.com/processone/ejabberd/issues/3764
Score:0
in flag

Thanks for your reply and sorry for the late response. The issue happened in the first node after successfully completing upgrade of two nodes. The first node became unresponsive after in the last two nodes. We found the reason for failure of node 1 was too many failed SQL queries completing rolling upgrade in the last two nodes. We found the reason for failure of node 1 was too many failed SQL queries due to connection issues.

The node names are [email protected] [email protected] [email protected]

To resolve the issue we had to kill the unresponsive eJabberd processes and restart eJabberd on first node. We are continuing with further upgrades.

Score:0
ru flag

This may be a dumb comment, but just in case it gives you some idea:

You are running the leave_cluster command in one of the nodes, and it doesn't connect correctly to the other one.

You could try to run the command in the other node.

If that doesn't help, maybe there's some internal way to attempt to remove a node from the cluster...

But you should update your question and clarify what are the node names, where you attempt to perform the admin task, and what exactly is the method you are attempting.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.