Score:0

Generic high availability abstraction for application possible?

in flag
Rob

I am currently examining the possibilities to make our existing server application highly available. The software handles messages that come via UDP resp. some higher level railway-specific protocols.

Is there a way (a framework or similar) to create an abstraction layer that takes care of redundancy and failover? I am thinking of something equivalent to a RAID controller with mirroring, but not only for storage, but for the whole system. From the "outside", you only ever see one system with its network interfaces. Internally, there are two or even more systems that run parallel and the abstraction layer takes care of everything, e.g. synchronizing the application state between the redundant machines. You install your software on the public end of the blackbox and internally everything is mirrored automatically. When one machine dies, the other one instantly takes over without dropping connections as it already is in the same state.

Is there any kind of generic solution for this? Or do we have to implement this manually inside our application?

Zac67 avatar
ru flag
There's no single solution for this. Duplicating a content delivery web server is not very hard, HA clustering a database + application server much more so, especially when there's no design already. Your question would need to go much more into detail to go really helpful answers: operating system, storage system, storage methods (db, flat file, transaction-based, ...), network protocols in use, maximum failover delay, maximum tolerated data loss, whether virtualization is already used and which, infrastructure that could be used, ...
Score:1
nr flag

Pacemaker, Corosync, and DRBD are Linux projects that provide an, "Open Cluster Framework" (OCF), for making generic Linux services HA.

Typically, DRBD synchronously replicates the storage (at the block level), while Corosync and Pacemaker manage which nodes are running which services in the cluster. Services can be controlled via OCF resource-agents (shell scripts with standard exit codes and functions), or via the service's systemd/upstart/sysv-init scripts. There are also generic "anything" agents that can spawn and monitor processes in the cluster, but I'd use those only if you're feeling really lazy as it's not robust.

One of the easiest, most transparent ways to achieve generic HA, is to use Pacemaker and DRBD to create a KVM cluster. Where the VM has your application configured and started at boot, and Pacemaker/DRBD handle the clustering underneath (on the hypervisor).

There are plenty of resources online explaining the detailed steps if you Google around, but LINBIT has a tech-guide (behind a softwall) that steps through setting this up: https://linbit.com/tech-guide/drbd9-kvm-rhel8/

More Reference:
DRBD: https://linbit.com/
Pacemaker/Corosync: https://clusterlabs.org/

Rob avatar
in flag
Rob
Thanks, Matt! That sounds promising, I will have a closer look at that.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.