Score:0

Bring Windows Failover Cluster On-Line with only One Node

in flag

I would like to set a policy such that my Failover Cluster will always come into service, even if only one (of the two nodes) is available.

Background: I have only two nodes in the cluster, plus a witness quorum in a share on the DC. For this question assume that the DC stays in-service. (Windows Server 2019).

If I shutdown node1, then node2 will be active. If I then shut down node2, then cluster will be stopped (obviously), however, if I then start only node1, the cluster will never recover. Not only will it not recover, without node2, but I don't see an easy way to make the cluster come into service with the cluster manager. The only way I can recover the cluster, in this scenario, would be to start node2, however, that does not seem (to me) to be real high-availability. IMO I should be able to set a policy or have a reasonably easy way to bring the cluster back on-line (perhaps after a waiting period), even if node2 never recovers.

Am I just thinking about this the wrong way or missing something obvious?

UPDATE: I do see an error:

Node 'SOM2' failed to form a cluster. This was because the 
witness was not accessible. Please ensure that the witness 
resource is online and available.

However, the witness was available at that time, which makes me suspect that this is a permission issue, that is, the witness share is available to the cluster but not the cluster service accounts on each node. Is that possible?

Is there some special permission setting on the witness share to ensure it can be accessed by the local service accounts on each node?

Update:

To fix the permission error (not the central problem), I needed to use a powershell command from:

https://docs.microsoft.com/en-us/powershell/module/failoverclusters/set-clusterquorum

Check the permissions on the witness to allow full control by the correct domain account, such as a service account where the password never expires and cannot be changed. Then, on a cluster host, first get rid of the current witness configuration:

Set-ClusterQuorum -NoWitness
Get-ClusterResource

if needed:

Remove-ClusterResource -Name "File Share Witness"

or remove it using Failover Cluster manager

then, re-add the file-share witness with necessary domain credentials to allow access:

Set-ClusterQuorum -NodeAndFileShareMajority \\server\path-to-witness -Credential $(Get-Credential)
Nikita Kipriyanov avatar
za flag
Are you talking about Windows Failover Cluster? Better this is to be said in clear in the title.
Score:2
cn flag

As @stuka noted, this is by design. The file was locked by a live node before the whole cluster went down. There's no way for Node1 to know that Node2 is not actually online but inaccessible over the cluster network. It has to rely on the locked file as being correct. It would be far worse for Node1 to come online in that scenario as if the cluster network went down, neither node would be able to break the quorum voting tie.

If you actually encounter this scenario, you have to edit the quorum settings and force a node back online manually.

In practice this shouldn't be of concern because it would be rare for the cluster to ever go entirely offline.

Two node clusters will always have a compromise in terms of HA. The witness file share establishes quorum, but it cannot cover all scenarios. A 3-node (or other odd node) cluster would provide better fault tolerance.

Score:0
ng flag

If the quorum witness share is accessible to the online node, it should definitely be able to bring the cluster online. This is standard WSFC behavior. If your cluster is not starting and the witness share is online, something else must be preventing it from starting. Look for any errors.

Also, how are the cluster quorum settings configured?

See here for reference: https://docs.microsoft.com/en-us/windows-server/failover-clustering/manage-cluster-quorum.

in flag
Updated to add information on an error.
Massimo avatar
ng flag
The cluster is represented in Active Directory by a computer account with the same name as the cluster itself; that computer account, also known as the "Cluster Network Object", needs full control permissions on the witness share.
Stuka avatar
gb flag
I think behavior OP is facing is expected. Node 2 is an owner of the witness (locks the file share). Node 1 can't lock it. So until node 2 is back online, cluster is not available, because there is no quorum. https://techcommunity.microsoft.com/t5/failover-clustering/understanding-quorum-in-a-failover-cluster/ba-p/371678
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.