Score:4

Windows File server heavy cpu usage - Render Farm with many clients reading data at once

pm flag

I have a windows file sever with Dual Xeon E5-2650v4 cpus. They are 14 core each so 28 cores total.

The network interface is a Mellanox Connectx3 40gbs.

The disk drives in the server are 7.68TB Sata 6gbps ssds x8 in a Raid 0 (software windows stripe)

There are 200 render nodes reading data from the above server in order to render 3d frames.

The file server has one numa node or cpu using 100% and the 2nd cpus is not utilized that much.

The problem is that there are users who are working in the 3d applications that are experience slow downs while the render is going on. When there is no rendering the artists working on their projects experience no slowdowns. The Network is not saturated as there is only like 5gbps of network traffic to and from the server. The capacity is 40gbps.

So what could be the reason for the slowdown. One thing that i suspect is that the Mellanox network card is on a pci express slot that is wired to cpu2 and maybe that is the reason for CPU 2 being utilized 100%. There are thousands of small files that each render node is reading in order to render. So maybe the large amount of files could be causing the high cpu utilization.

Any ideas?

Massimo avatar
ng flag
Please add details about the actual processes using up CPU.
tom greene avatar
pm flag
Actually the system process in task manager using all the cpu. Also the disk queue length is 11 and number of open files is 151,000 with a 110 render nodes
Massimo avatar
ng flag
Which Windows version?
tom greene avatar
pm flag
Windows Server 2022 21H2
mfinni avatar
cn flag
Is this NIC running Infiniband or Ethernet?
tom greene avatar
pm flag
The nic is running ethernet
Score:8
br flag

A few things to note, that 40Gbps NIC is only connected to one socket/NUMA-node - so if it's associated with the second socket then any file-sharing workload handled by the first socket will have to traverse the QPI bus to get to the NIC, thus turning your second socket into a part-time IO controller for the first socket. If the second socket isn't really being used, consider removing it completely, moving its memory to the first socket slots and moving the NIC to be associated with the first socket. Sometimes less is more :)

Secondly 40Gbps interfaces are usually just a pre-bundled connection of 4 x 10Gbps, in an LACP/Etherchannel fashion, which is fine if your server is talking to lots of varied MAC addresses but if you're always talking to the same MAC (say one client, or another switch) can limit you to 10Gbps or so of bandwidth. This is one of the reasons we've moved to 25Gbps from 40Gbps NICs.

On top of that you've got a software-driven R0 setup, which takes away not just CPU resource, but importantly resources used by the kernel - now a lot of people don't realise this but it doesn't matter how many cores you might have, most kernels will only use a certain number of, typically low-numbered, cores for kernel work - process scheduling, malloc, IO etc. software RAID has a couple of benefits but obviously hardware RAID has it's own benefits - and one of them is the comparatively-lower CPU overhead they usually have.

The files thing is a pain yes, almost all file systems are slower when dealing with small files than large one - in fact I'm struggling to think of any that don't certainly NTFS isn't great. Ironically removing jumbo frame support everywhere can actually help with lots of small files, a bit anyway.

Finally one thing that does worry me about this kind of system is the single-point-of-failure with the single server. Consider moving all of this file service to a central multi-'head' storage array if you have the budget, it should be faster and more reliable/resilient - easier to support too. Another option might be some kind of centralised or decentralised Distributed File System such as Ceph, even Window's own DFS-R perhaps. I have a friend who runs a very large renderfarm for the film/movie industry and they use that kind of thing, not cheap though to be fair.

tom greene avatar
pm flag
Thanks for the detailed answer. Also the disk queue length is 11 and number of open files is 151,000 with 110 render nodes running. Also are there any settings to the Mellanox Connectx-3 card i can do to lighten things up?
mfinni avatar
cn flag
You're bottlenecking on disk IO, not network, so there's probably not a lot of optimization to do at the network layer (other than Chopper's suggestions above). Get a RAID card or a storage array.
Score:3
ng flag

The system process being that busy means the OS is having do deal with heavy overhead, either from its internal workings or from device drivers. Some likely culprits:

  • NIC firmware/drivers (are they up to date?)
  • Software RAID (a hardware RAID controller would be a much better choice here)
  • Filesystem issues (lots of small files are notoriously harder to deal with than fewer big ones)
  • SMB (see "filesystem", but much worse)

I'm of course assuming your NIC supports RDMA; if it doesn't, you should immediately switch to one which does.

My suggestion is to examine performance counters to find out where the system is actually spending so much CPU time.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.