I have keepalived running on 2 (centos7) vms and am able to see that the failover is working successfully in simplest form by issuing ping -t 192.168.1.11
and pulling vm1 from the network after which I see typically only one of the pings time out before vm2 picks it up and starts responding.
After successful failover, the system resets after vm1 is brought back online and it seems that I typically do not see any pings timeout, though I imagine thats due to chance since things arent synchronized...
The problem that I'm seeing is with http in this same setup. I have a webapp running on vm1 and vm2 and I can see each GET as it comes in (via ssh on either vm1 or vm2). I also wrote a test app on my dev box to loop simple http get's of the main page (with 1s timeout) while I remove vm1 from the network and I am seeing failover/reset times anywhere from <1sec to 27sec.
looking at the documentation here I don't see any params I could change that might influence this, but would like to somehow get more insight into why this varies so much and if I can reduce the failover time. Also this top answer here suggests that advert_int is significant, but I have it set to one and am still seeing these varied results...
here is vm1's kad config file:
global_defs {
script_user root
}
vrrp_instance VIP01 {
state MASTER
interface eth0
virtual_router_id 101
priority 150
advert_int 1
authentication {
auth_type PASS
auth_pass [snip]
}
virtual_ipaddress {
192.168.1.11
}
}
and vm2:
global_defs {
script_user root
}
vrrp_instance VIP01 {
state BACKUP
interface eth0
virtual_router_id 101
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass [snip]
}
virtual_ipaddress {
192.168.1.11
}
}