So I've got: machine001, machine002, machine003.
machine001 has 2 resources, machine002 has 1 resource. Normally they don't go on the same host, unless machine002 goes in standby.
Recently, I saw machine002 appearing 2 times. 1 time online, 1 time offline.
Checking with sudo crm_mon -R
showed they have different node ids.
I tried deleting the node id, but it refused. I tried deleting the node name, but was told there's an active node with that name.
I went in with sudo crm configure edit
and it showed the configuration to be:
(111) machine001 \
standby=off
(222) machine002 \
standby=off
(333) machine003 \
standby=off
(12345) machine002
other_settings... \
So, I remove the line (12345) machine002
, save and commit the CIB... and machine002
completely disappears from the output of crm_mon
and the output seems to constantly be trying to find it again...
Only way to get it back is to restart corosync and pacemaker on that node.
I'm at a loss for what is going on here. Can anyone point me in the right direction?
EDIT: The requested corosync.conf file:
totem {
version: 2
cluster_name: debian
token: 3000
transport: udp
token_retransmits_before_loss_const: 10
join: 60
consensus: 3600
vsftype: none
max_messages: 20
clear_node_high_bit: yes
threads: 0
rrp_mode: none
crypto_cipher: none
crypto_hash: none
interface {
ringnumber: 0
bindnetaddr: 192.168.0.0
mcastaddr: 239.255.64.1
mcastport: 5405
ttl: 1
}
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
logfile: /var/log/corosync/corosync.log
to_syslog: no
syslog_facility: daemon
debug: off
timestamp: on
logger_subsys {
subsys: QUORUM
debug: off
}
}
quorum {
expected_votes: 3
}
nodelist {
node {
ring0_addr: 192.168.0.25
name: machine001
id: 1
}
node {
ring0_addr: 192.168.0.26
name: machine002
id: 2
}
node {
ring0_addr: 192.168.0.27
name: machine003
id: 3
}
}