
Why are server CPUs so weak and expensive?

co flag

So looking at CPUs with good single core performance. One example would be AMD EPYC 73F3 which is (according to passmark) on par with Ryzen 5950X, but 10 times more expensive and it's rated over x2 TDP.

In the intel realm there's really nothing on par when you look at single core performance as most of the offerings are marginally faster from Xeon E5-2660 v3 which is 8 years old. The upside for intel is that their server CPUs are not NUMA, so memory handling is better, and support seems more polished so not even sure how these synthetic benchmarks relate to real life situations where memory needs to be moved between cores in non-NUMA aware applications (is there really anything NUMA-aware out there besides oracle server?)

Xeon workstation CPUs seems to be on par with AMD's customer grade CPUs, but still not as powerfull as intel CPUs available for ordinary customers. The only upside for xeon workstation is that they support ECC, which seems to be the only reason why this space still exists as you could get much better customer CPU probably for half the price, but there's no ECC option and RAM is limited.

Also i noticed there are some issues with EPYC but Ryzen is working perfectly fine. So that may be a loaded question but what are people using nowdays as maybe im missing something but getting a server which is on par with some super-cheap ryzen setup in terms of power usage, reliability and performance seems impossible if you don't want to invest some enormous amount of money and the best you can do is to end up with single core performance of consumer grade 2YO Ryzen anyway.

Has innovation in server space stalled?

Romeo Ninov avatar
in flag
This question will get mostly opinion based answers and probably will not survive.
co flag
Yes looking for opinion based answers. As for constantly running on high loads, usually cooling is the issue. ECC RAM is common on AMD consumer hardware, not that expensive. Have to pay premium on Intel for ECC, the thing is that on my laptop with 64G im constantly getting some small data corruption, while the sticks seems fine. Actually consistent running at high loads should be easier to achieve as metal expansion and contraction is putting additional stress on components.
U. Windl avatar
it flag
"10 times more expensive" *than* what? Also I'm not sure than recent Intel CPUs are non-NUMA (as claimed). I'm missing a concrete question.
br flag

So let's go through your opinions line-by-line then;

So looking at CPUs with good single core performance. One example would be AMD EPYC 73F3 which is (according to passmark) on par with Ryzen 5950X, but 10 times more expensive and it's rated over x2 TDP.

This is a terrible comparison - try putting two CPUs in a 5950X-based 'server', or more than 128GB of memory, or having more than 64MB of L3 cache, or more than 16+4 PCIe lanes.

It's not clear if you know what a server is I'm afraid, not in a production/professional environment anyway. Airliners, cruise-ships, oil-tankers, buses, trains etc. are designed to be reliable, resilient and deal with a wide variety of usage requirements over multiple years at a predictable cost - jet-fighters, speedboats, your car etc. are faster yes but designed for single, focussed use cases where reliability and costs are less of an issue.

In the intel realm there's really nothing on par when you look at single core performance as most of the offerings are marginally faster from Xeon E5-2660 v3 which is 8 years old. The upside for intel is that their server CPUs are not NUMA, so memory handling is better, and support seems more polished so not even sure how these synthetic benchmarks relate to real life situations where memory needs to be moved between cores in non-NUMA aware applications (is there really anything NUMA-aware out there besides oracle server?)

Firstly nobody in a server environment cares about single-core performance, maybe a tiny handful do but >99% of people do not. It appears that you think that clock-speed is the only measure of single-core performance anyway, forgetting the impact of memory/QPI/UPI/IPC performance increases - not only would something as low-end as a 4210T (10c/20t, 2.3-base/3.4-turbo) absolute stomp a 2660v3 into the ground (and they list at only 555 USD by the way) but there are SKUs like the 8732C (28c/56t, 3.2-base/3.5-turbo) that would make the older chip seem silly - plus all their 40c/80t SKUs too.

And of course Intel does NUMA too, you probably mean on-socket-NUMA, which is a different thing, and the 92xx Xeons even did that - look at the 9282 (56c/112t) for instance, which has the same memory concerns as Zen CPUs.

Your assertion that "Oracle Server" is a rare case of being NUMA aware is wrong and pointless anyway as what matters is that the base OS or Hypervisor is NUMA aware, and anything even vaguely recent has had this for years. Any modern Linux/Windows will happily keep processes and their memory 'near' to each other within a NUMA domain unless very highly contended indeed, the same has been true for ESXi/KVM/Xen for even longer, most server applications just don't need to consider NUMA at all as it's all taken care of for them.

Xeon workstation CPUs seems to be on par with AMD's customer grade CPUs, but still not as powerfull as intel CPUs available for ordinary customers. The only upside for xeon workstation is that they support ECC, which seems to be the only reason why this space still exists as you could get much better customer CPU probably for half the price, but there's no ECC option and RAM is limited.

Newer Threadripper and Ryzen CPUs and chipsets support ECC, so your point here is moot.

Also i noticed there are some issues with EPYC but Ryzen is working perfectly fine. So that may be a loaded question but what are people using nowdays as maybe im missing something but getting a server which is on par with some super-cheap ryzen setup in terms of power usage, reliability and performance seems impossible if you don't want to invest some enormous amount of money and the best you can do is to end up with single core performance of consumer grade 2YO Ryzen anyway.

Which problems, can you be specific?

Again this your lack of production experience. This site is very specifically for professional sysadmins/system-designers, we make that very clear when you join. And our number one priority when it comes to servers is to maintain the data we have and to maintain service for the dozens/hundreds/thousands of applications and users served by our infrastructure. Yes we care about power-usage, heat-management and overall server performance but these are distant secondary concerns to reliability, resilience, monitoring-capability, pre-failure-warning and capacity in terms of cores/threads/memory/PCIe-lanes - and frankly anything BUT production-level CPUs fail on multiple, if not all, of these criteria. If you get bored google 'RRAS', see if that helps you understand.

Has innovation in server space stalled?

No not at all, but it's inherently never going to be at the same bleeding edge as consumer parts, simply because we need reliability - why would we risk running a server with a CPU with a few slightly faster cores but lose all of those features we need that I list above.

My analogy above is key - the vast amount of people fly with others on airliners as it's cheaper and more reliable than flying everywhere in a fighter-jet, the same for shipping containers - you could put one on a faster speedboat but the numbers don't add up, the same is true for lots of other ways of getting things done - handle more load, more reliably and more cheaply with larger well-engineered solutions rather than unique custom ones.

Metaphorically you've walked into the pilot's lounge at an airport and ripped into all the Boeings and Airbuses because they can't do a barrel-roll as easily as a Cessna. We're not idiots, just about everyone who comes here regularly has a decade or more (32 years in my case) of successfully doing this job on very large infrastructures, we're professionals who know how to research our work and learn from others in the same field (literally the point of this site).

Do you honestly think this post of yours is groundbreaking, genius-level work that hundreds of thousands of people, including every server and CPU manufacturer, has overlooked - or might it be that maybe you just need to learn more?

co flag
Actually disagree with some of your points. Even modern OS are horrible at managing NUMA and just manually setting affinity will land you 20-40% improvement in avg response times on EPYC even for applications with small memory footprint (< 50% of single node size) so it requires application support anyway and OS support is there mainly to keep it working, it seems. Same as for single core performance being irrelevant. Probably 90% of the internet is php/node, multithreaded code is too complicated and to expensive to write and maintain for most users.
co flag
Very old 2260v3 will eat EPYC7451 regarding response times (around 40-60% faster for PHP), similar is true for mysql. Probably because of NUMA layout and inadequate support in software (im not sure if such fragmented layout can be optimally supported anyway, i think not because eg. OS disk cache cannot be optimized). Agreed that EPYC can process more concurrent requests, but you have slower to much slower response times and these response times add up on microservices architecture.
co flag
Not sure how 4210T could be more performant than 2260v3 considering the CPU is around 2x slower according to benchmarks, but that's actually a good comparison as i'm looking for some "replacement" for 2260v3 which would be around 40-50% more performant for close that price, which according to benchmarks seems impossible on recent intel hardware. Unless the benchmarks are incorrect.
us flag

Server CPU’s are generally have instructions and features that are more suitable for the typical server workload and enterprise deployment:

  • optimised for concurrency (rather than single threaded application performance server workloads often scale by using more cores rather than running at the highest clock frequency on a single core)

  • they can be fitted with low-rise CPU coolers to fit in 1U server chassis

  • virtualisation support

  • more CPU cache

  • additional CPU instruction sets

  • no built in graphics

  • a server CPU works in concert with related chipsets on the motherboard and you’re likely to have :

    • multi processor support
    • more memory banks and more supported memory
    • ECC memory
    • more PCI lanes and expansion slots
    • wired network at 10 GbE (or faster)
    • no desktop/laptop features:
      • no Wi-Fi
      • no Bluetooth
      • no HDMI/display-port/multiple video ports
    • out-of-band management
    • SAS rather than SATA ports
    • support for many NVME's
  • and others

Using server CPU's in desktops usually only makes sense when you're looking for those features and have a specific work loads that are better met with a workstation featuring server specs. In general Server CPU's don't make "better" desktops


Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.