Score:3

Raid 10 Performance Issues

dk flag

I am in the process of setting up a mirrored storage system for our Business.

We don't have the budget for prebuilds so I am trying to do what I can to get the best bang for our buck. Here is our hardware breakdown:

San1 and San2 Windows Server 2019

SUPERMICRO MBD-H11SSL-I Amd Epyc 7251 8 core CPU

64GB RAM 8GB x8

SSD for OS 500GB

LSI 9380-8i8e

Intel 10G nic, 4 port - Iscsi network

Intel 25G nic, 2 port - Sync between Servers - Jumbo Frames-9014.

1 internal nic 1G (data), 1 IPMI In use on MB

IW-RJ224-03 24bay SSD Enclosure, Populated with 24 2TB Samsung 860 Pros, Raid10 configuration. Connected via 2 sas cables to the 9380 card.

We will be using Starwind to sync the 2 servers.

While in the process of setting up Starwind, I have been trying to see our sync performance Using varying image sizes from 500G to 5TB

When a sync starts, the system writing the sync data is barely usable. The system stutters, performance monitor hangs, and everything runs horribly unless I turn off all caching options. If I enable writeback, or Enable disk cache, I notice Core0 on numa 0 peg 100% and everything goes south... other cores show very little, or no usage, minus a couple.

I have tried every kind of combination of drive setup to get through this, but I am getting nowhere at this point. I must be missing something. I have configured the Array in 2x8, 6x4, and 4x6 (standard 64k strip) settings thinking it was some drive limitation holding me back, but I have had 1 instance, where nothing went wrong, and the drive wrote a 5TB sync with no issues, and in an hour with perfect system response. It was going over 1.6GB/s at that time with both Caches Enabled on a 4x6 array. I did notice that core0, numa0 was near idle that time, and core 2,numa 0 was doing the heavy lifting. Took everything down to replicate and rebuild, been stuck since. Now every transfer maxes out at about 600MB writes with cache off, and when on hits about 1GB/s before it is noticeably struggling.

Any Ideas to help point me in the right direction are appreciated! Firmware up to date on the 9380, Drivers for Raid cards, Nics, and MB components are all up to date.

Score:5
vn flag

Here some thoughts, which may help to solve the issue:

  1. If you are using some kind of NIC-Teaming, it may affect performance of iSCSI and replication in unpredictable way. Most SAN’s/VSAN’s vendor don’t support Teaming and recommend MPIO instead. Disable NIC-Teaming.
  2. You mentioned Intel 25G NIC. XXV710 model may have issues with enabled Jumbo Frames. Disable Jumbo Frames and run additional tests.
  3. Jumbo Frame value 9126 is not typical to Windows OS and used mostly on switches. Windows default value is 9014.
  4. LSI 9380 doesn’t have Samsung 980 Pro in the list of supported drives. Moreover, 980 Pro is an NVMe drive (not SATA). Are you sure, that you have 980 Pro?

I’d also recommend to contact Starwind’s support, as BaronSamedi1958 mentioned.

dk flag
Yikes I was all over the place on there huh? Yeah they are 860 SSDs.., and yeah it was 9014... was in a rush after 10 hours of pulling my hair out :). I did get it pinned to the 710 25GB nic not having numa scaling enabled. that cleared up the issues I was having instantly.
Score:3
kz flag

You need to fine tune the synchronization priority for the whole thing to function properly.

https://www.starwindsoftware.com/help/ChangingSynchronizationPriority.html

As you deal with a paid solution I’d suggest to apply for support.

dk flag
Priority should not effect server performance. It's on a 2x25Gb server to server sync. plenty of bandwidth. sync is choking up the server when it's only using about 5Gb per connection.
BaronSamedi1958 avatar
kz flag
This isn’t about network, it’s about synchronization traffic saturating DISK bandwidth.
dk flag
Thanks for the help. It was actually the NUMA scaling was not on on the 25G nic... so it was pegging 1 core and holding everything up, bringing the system to an unresponsive state. Thank you.
BaronSamedi1958 avatar
kz flag
Great to hear the issue is gone! :)
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.