Score:2

Storage Spaces Direct CSV_REFS latency degraded when diskspd not run on Owner Node, CSV_NTFS good on all nodes

br flag

CSV_REFS performs properly when the diskspd test is run on the disk’s Owner Node. Latency increases 35x for 64k blocks when the test is run on any other node in the 4-node cluster. I can switch the owner node around and run the test on the new owner and I will continue to get good performance. When I run the test from a non disk-owner, the results are poor. CSV_NTFS performs strong regardless of the node in which it runs. I’m considering giving up on CSV_REFS for CSV_NTFS because of this observation.

I’m running Windows Server 2019.

I have considered that RDMA may be the problem, but I can’t find any evidence that I’m having RDMA issues. The logs are clean, test-rdma.ps1 runs fine.

Does anyone have any thoughts as to why this would occur?

cn flag
CSV with REFS uses *File System* Redirected Mode for all I/O. That means all the I/O transverses the owner/coordinator node. To confirm this, you can open the same vendor support call that the following person did. Their response from the vendor support engineer is that RDMA is implicitly required. https://community.spiceworks.com/topic/2286588-please-do-not-use-refs-for-cluster-shared-volumes-provided-by-a-san
cn flag
"The disk can be provisioned as Resilient File System (ReFS); however, the CSV drive will be in redirected mode meaning write access will be sent to the coordinator node." https://learn.microsoft.com/en-us/windows-server/failover-clustering/failover-cluster-csvs
Strepsils avatar
cn flag
Agree with what @GregAskew said above. File System Redirected mode. If you run the test on any other node than the owner, I/O will travel over the owner node. Hence performacne degradation: https://techcommunity.microsoft.com/t5/failover-clustering/understanding-the-state-of-your-cluster-shared-volumes/ba-p/371889
W Lucking avatar
br flag
Thanks, Greg. Focusing me on this aspect of CSV was extremely helpful. Paramount in my conclusions is that CSV and the potential availability of Direct IO in a S2D deployment are not going to lead to performance gains, regardless of whether I couple it with ReFS or NTFS, that might be possible without CSV because of File Redirection Mode. Still, RDMA will be leveraged for whatever benefits it might offer. From my testing today, I agree with the a claim made that NTFS should be used with CSV. CSV and ReFS is painfully slow in S2D.
RiGiD5 avatar
cn flag
Came here just to confirm both statements: a) ReFS/CSV is always in redirected mode, unless you’re an owner mode, and b) There’s no single place inside Microsoft official docs confirming this and calling it a “feature” rather than a “bug”. We stick with NTFS unless we can’t migrate the customer off Hyper-V.
W Lucking avatar
br flag
It is also NTFS CSV. Any CSV will use file system redirection. I think it is a requirement of CSV to use file system redirection to have the availability and shared accessibility qualities we seek from it. From my experience and reading ReFS CSV has poor performance, while NTFS CSV has excellent performance on the owner node. ReFS is probably good when not using CSV.
Score:1
kz flag

It's by design. ReFS is in redirected mode always. See:

https://learn.microsoft.com/en-us/windows-server/failover-clustering/failover-cluster-csvs

"Cluster Shared Volumes (CSV) enable multiple nodes in a Windows Server failover cluster or Azure Stack HCI to simultaneously have read-write access to the same LUN (disk) that is provisioned as an NTFS volume. The disk can be provisioned as Resilient File System (ReFS); however, the CSV drive will be in redirected mode meaning write access will be sent to the coordinator node."

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.