Score:5

Server

Building a low-cost, high-availability cluster using Windows Failover Cluster or Proxmox

sokar

8/11/24, 11:47 AM

I need to build a high availability cluster (key functionality, SLA almost 24/7) for virtual machines (AD DC, FS, WSUS, Print server, one Oracle database, a few Linux (not important to business). Performance is not that important, but everything need of course to work well.

Things I have:

for now one physical site (data center)
license for Windows Server 2019 Standard
ability to install Proxmox and buying support for Proxmox
two not identical Lenovo servers with support contract (one have 16 cores the other one 20 cores, one have 9 x 279GB drive the other 3 x 279GB (both can use RAID5)
two 1 GB stacked switches
Synology with 2 power supply and 4 x GB ethernet card

Things I can buy:

a pro storage array
a hba card for Lenovo servers

The initial idea is to build a Windows high availability cluster connected over iSCSI to Synology (or a new storage arrey) for virtual servers that not need a fast read/write storage bandwidth (2 nodes and one storage device 1 point of failure).

I've read some articles about Storage Replica. Can I build a cluster with 2 nodes and one storage array for virtual machines that do not require performance and use the Storage Replica mechanism (on the same nodes) on volume on disk nodes for virtual machines that need more performance?

EDIT (more info):

Can I have VM with one (one from two) DC sever on Windows Failover Cluster? The second VM with DC (all AD master roles) will be on other server that is not part of cluster.

I have 2 VM with DC for AD but I need failover solution for other services/servers.

There is a option called Cluster awernes updates in Windows failover. It is no working as I assum (because name is self explaining)?

Recovery Time Objective and Recovery Point Objective are not so strict. Bussines will alive if ther by a 1 hour gap for less critical services and 15 min gap for mission critical services.

188

4 + 4

virtualization

windows

windows-cluster

failovercluster

proxmox

Greg Askew

8/11/24, 12:03 PM

Active Directory does not support Windows Failover Clusters.

vidarlo

8/11/24, 1:35 PM

Have you considered making the services redundant on service level, by e.g. hosting two DC's?

Zac67

8/12/24, 6:18 AM

Windows requires you to install updates - on the machine level you won't reach .99999 and need to create redundancy on the service level. Also, the network, storage (Synology sounds like single controller), power and UPS also need to be redundant.

Greg Askew

8/12/24, 3:34 PM

Two factors missing are Recovery Time Objective and Recovery Point Objective. Storage Replica will not have the same Failover time as normal shared storage. Also I can tell by asking about domain controllers, those should not be anywhere near this setup. Active Directory DC's absolutely should be able to survive on their own, independent of unproven and problematic technologies that will delay recovery. When Storage Replica collapses and needs to be restored, it helps to be able to authenticate and get basic dial tone recovery while the collapsed storage is restored.

Score:7

Server

BaronSamedi1958

8/11/24, 12:20 PM

You can absolutely use SR (Storage Replica) to build a “poor man’s” Windows Server Failover Cluster (WSFC). See the example below where guys used SR to cluster SMB3 file service.

https://www.starwindsoftware.com/blog/part-1-storage-replica-with-failover-cluster-and-file-server-role-windows-server-technical-preview

This is how the process can be guided. As much as it can be, of course.

https://www.virtualizationhowto.com/2019/11/windows-server-2019-storage-replica-failover-process/

The problem is SR was intended to be used as a DR (Disaster Recovery) solution, not as an HA (High Availability) one. SR is not very flexible, needs some babysitting, quite seldom used, and requires Datacenter edition (except anemic 1TB edition included into Windows Server Standard). Not recommended. This is what Microsoft has to say.

https://learn.microsoft.com/en-us/windows-server/storage/storage-replica/storage-replica-overview

Bottom line… If you already paid for Datacenter you can try S2D (Storage Spaces Direct) thing, which is for brave people only as it’s not reliable still, or you can use Virtual SAN (vSAN) which is free and can be used not only with Standard, but with a free Hyper-V Server.

https://www.starwindsoftware.com/starwind-virtual-san-free

+ 5

eKKiM

8/11/24, 1:03 PM

Can you elaborate on why S2D is "only for the brave"?

BaronSamedi1958

8/11/24, 2:23 PM

https://storagespaceswarstories.com/category/stories/

BaronSamedi1958

8/11/24, 2:24 PM

https://www.reddit.com/r/sysadmin/comments/ah07ri/my_review_after_a_year_of_storage_spaces_direct/

BaronSamedi1958

8/11/24, 2:24 PM

https://www.reddit.com/r/sysadmin/comments/609e98/another_catastrophic_failure_on_our_windows/

Greg Askew

8/11/24, 4:38 PM

@eKKiM: S2D fundamentally makes zero sense for most organizations because storage is simpler and less expensive than it ever has been. S2D exists because it is what Microsoft uses internally and at Azure. That doesn't mean it's a good fit for anyone else.

Score:4

Server

RiGiD5

8/11/24, 9:25 PM

Domain controllers use own replication mechanism and don’t need any shared storage.

https://learn.microsoft.com/en-us/windows-server/identity/ad-ds/get-started/replication/active-directory-replication-concepts

https://www.manageengine.com/products/active-directory-audit/kb/how-to/how-to-check-if-domain-controllers-are-in-sync-with-each-other.html

File servers can be made HA with a help of DFS-R solution. It’s no perfect, but it works with quite some limitations (no true transparent failover, read penalty, and no split brain protection).

https://learn.microsoft.com/en-us/windows-server/storage/dfs-replication/dfsr-overview

Oracle has own database (DB) replication, similar to MS SQL Server Always On Availability Groups (AGs).

https://www.arcion.io/learn/oracle-replication#:~:text=Oracle%20replication%20allows%20users%20to,reporting%2C%20testing%2C%20and%20backups.

In a nutshell: Re-think what you’re doing, it could be you’re overthinking and over engineering the whole thing.

+ 0

Score:3

Server

El Marinero

8/14/24, 10:04 AM

You should avoid Synology in production. It has single controller so during firmware updates, reboots or any issues you’ll have whole cluster down. Classic SPOF.

https://en.m.wikipedia.org/wiki/Single_point_of_failure

+ 2

sokar

8/16/24, 11:03 AM

Can You explain what is a difference (when SLA 24/7 is important) between Synology device with 2 PSU and LACP and pro storage array ? I have never use a pro storage array.

El Marinero

8/16/24, 4:55 PM

Synology won’t have high uptime because of the single-controller design. + Synology support isn’t Enterprise level. Taiwanese business hours.

Score:0

Server

symcbean

8/11/24, 3:07 PM

If you want to use Proxmox for high availability then you need a cluster with three nodes (or (n*2 +1)) but the additional node can be VERY basic - just there as an observer to arbitrate on split-brain decisions. Note that the Proxmox HA privdes a means to spin up missing VMs when a cluster node goes offline. You don't need that for MS-AD - just make sure the existing nodes are distributed across separate hardware.

Do make sure you have enough resource (storage, CPU, memory) to run the critical VMs and any residual VMs on a surviving physical node; there's no mechanism in Proxmox (currently) to shutdown non-critical VMs to free up capacity for the critical ones.

Your Synology box is SPOF so that's not suitable for the primary storage for your HA VMs. It is a good place to keep backups, but you really want to use PBS for backups and that does NOT like latency in storage access - ISTR there are docker images available with PBS if your Synology supports docker.

The setup is a bit small for ceph based storage, but is there a reason you don't want to use the built-in ZFS replication or gluterfs on the hypervisor to ensure that the disk images are in sync?

While generally stuff like replication and HA is smoother when handled nearer the top of the application stack, it does mean having different implementations for each component - trying to use MS-Storage-Replica for your Oracle database is not likely to be a pleasant experience. I would definitely recommend using Proxmox / ZFS replication as the default then add exceptions where that is not a good fit.

+ 3

RiGiD5

8/11/24, 9:26 PM

ZFS replication is a DR, it’s no HA solution. It won’t help to OP at all.

symcbean

8/12/24, 2:53 PM

The mechanisms you listed give no better guarantees of consistency. And with the exception of active directory don't solve the service availability issue without additional components.

RiGiD5

8/15/24, 8:51 AM

What are you talking about?! AD replication is basically built-in, it’s 100% Microsoft thing. Dude, there’s hundreds of millions replicated ADs deployed world-wide.

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Building a low-cost, high-availability cluster using Windows Failover Cluster or Proxmox

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.