Score:0

Stretch Cluster File Server with Storage Replica - Partnership not reversing on failure simulation

ug flag

First off, sorry if this is the wrong place to post this.

I have a Storage Replica setup running in a MSCS Stretch Cluster configuration, two sets of two VMs with each set running on a separate ESXi host, acting as a File Server. Each set has two vhdx disks attached to them on a virtual bus share SCSI controller. Each replica source and destination disks have identical drive letters as their respective partner across all nodes.

I configured the setup to the letter of Microsoft's guide here:https://docs.microsoft.com/en-us/windows-server/storage/storage-replica/stretch-cluster-replication-using-shared-storage

For the most part, everything is running as it should except two things. When I drain roles from or cut power to the first set of cluster nodes, the disk that is attached to them transfers ownership to one of the nodes in the other set and the disk that is being replicated to stays offline.

From my understanding, what should have happened when the first two nodes were taken out, or more importantly the original source disk is taken offline to simulate failure, is the Storage Replica partnership reversing automatically, the original replica destination disk coming online and acting as the original disk in the File Server role did, granting access to all file shares to clients as if nothing happened.

Instead, to achieve that functionality, I have to take the original source disk offline to simulate failure, then I have to right click and remove the replica partnership, remove it from the file server role, manually add the destination disk to the file server role and lastly bring the role back online. Then additionally set up a new Storage Replica partnership as the now active disk replicating to the one where failure was simulated as the new status quo.

All of that takes less than a minute to go through, but it still requires manual work instead of automatic failover as I understood it to function.

My question is, did I misinterpret how the system is supposed to work and that this is just how things are for the failover scenario described above? Or do you guys think there's a configuration error somewhere along the line?

As a sidenote, when I try to manually reverse partnership (with all nodes and disks up) using the Set-SRPartnership powershell cmdlet, the source and destination disks stay the same after the cmdlet runs its course.

Here's the schematic I made while drafting the system, to help clarify the setup: https://i.imgur.com/5STByzZ.png

Any and all input is greatly appreciated, even pointers to a better place to post this question to :)

Please be gentle since I'm just a student and this is my first real project assignment at the company I work at, though I can't ask any other colleagues in IT since none of them have any experience with clustering.

Score:0
us flag

If you have used 2016 to configure a stretch cluster, to reverse replication you have to use PowerShell scripts. The offline disk will then become the online disk and vice versa.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.