Score:0

Corosync allowing resources on two systems

de flag

We are using pacemaker/corosync for HA. This includes both virtual IP's and software. The other day we had a failure and corosync showed the IPAddress started on both nodes which IMHO should never happen. Every time I ever took a node out of service it first stopped the IP on nodeA before it went over to nodeB. My question is this a bug or bad configuration? I understand we may want resources running on more than one serve (e.g. httpd) but in what situation would you want the same IP running on more than one PC on the same LAN? Below is my current running configuration.

node 1: s1.site.example.org \
        attributes standby=off
node 2: s2.site.example.org
primitive vendor_blfd systemd:vendor_blfd \
        op monitor interval=10s \
        meta target-role=Started
primitive vendor_sipd systemd:vendor_sipd \
        op monitor interval=10s \
        meta target-role=Started
primitive opensips systemd:opensips \
        op monitor interval=10s \
        meta target-role=Started
primitive public_222 IPaddr2 \
        params ip=XX.XX.XX.222 cidr_netmask=27 \
        op monitor interval=30s
primitive public_NYC_10 IPaddr2 \
        params ip=XX.XX.XX.10 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_19 IPaddr2 \
        params ip=XX.XX.XX.19 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_23 IPaddr2 \
        params ip=XX.XX.XX.23 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_40 IPaddr2 \
        params ip=XX.XX.XX.40 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_41 IPaddr2 \
        params ip=XX.XX.XX.41 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_42 IPaddr2 \
        params ip=XX.XX.XX.42 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_43 IPaddr2 \
        params ip=XX.XX.XX.43 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_44 IPaddr2 \
        params ip=XX.XX.XX.44 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_45 IPaddr2 \
        params ip=XX.XX.XX.45 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_46 IPaddr2 \
        params ip=XX.XX.XX.46 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_47 IPaddr2 \
        params ip=XX.XX.XX.47 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_48 IPaddr2 \
        params ip=XX.XX.XX.48 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_49 IPaddr2 \
        params ip=XX.XX.XX.49 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_50 IPaddr2 \
        params ip=XX.XX.XX.50 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_51 IPaddr2 \
        params ip=XX.XX.XX.51 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_52 IPaddr2 \
        params ip=XX.XX.XX.52 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_53 IPaddr2 \
        params ip=XX.XX.XX.53 cidr_netmask=25 \
        op monitor interval=10s
primitive public_NYC_54 IPaddr2 \
        params ip=XX.XX.XX.54 cidr_netmask=25 \
        op monitor interval=10s \
        meta target-role=Started
primitive public_NYC_55 IPaddr2 \
        params ip=XX.XX.XX.55 cidr_netmask=25 \
        op monitor interval=10s
group vendor public_NYC_10 public_NYC_19 public_NYC_23 public_NYC_40 public_NYC_41 public_NYC_42 public_NYC_43 public_NYC_44 public_NYC_45 public_NYC_46 public_NYC_47 public_NYC_48 public_NYC_49 public_NYC_50 public_NYC_51 public_NYC_52 public_NYC_53 public_NYC_54 public_NYC_55 public_222 opensips vendor_sipd vendor_blfd \
        meta target-role=Started
property cib-bootstrap-options: \
        have-watchdog=false \
        dc-version=1.1.23-1.el7_9.1-9acf116022 \
        cluster-infrastructure=corosync \
        cluster-name=vendor \
        stonith-enabled=false \
        no-quorum-policy=ignore \
        last-lrm-refresh=1650666825
Score:0
nr flag

Without proper STONITH configured and enabled (stonith-enabled=false) there's nothing preventing a network split between the nodes from causing services to start on both nodes.

Once the network split resolves Pacemaker should begin a recovery process by stopping the group on both nodes and then starting it again on one node. If there is a stop operation failure during that recovery process the recovery will hang. STONITH would save you here as well.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.