Score:0

Uneven cpu utilization linux dual socket server

mg flag

Recently I noticed on two of our servers strange distribution of tasks. Both servers are dual cpu EPYC 7402 physically the same platforms, running the same tasks, differ in numa configuration, kernel and ubuntu.

Server 1 configuration and load:

Linux sv-marmoset222 5.15.0-46-generic #49~20.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          96
On-line CPU(s) list:             0-95
Thread(s) per core:              2
Core(s) per socket:              24
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           49
Model name:                      AMD EPYC 7402 24-Core Processor
Stepping:                        0
CPU MHz:                         2794.626
BogoMIPS:                        5589.25
Virtualization:                  AMD-V
L1d cache:                       1.5 MiB
L1i cache:                       1.5 MiB
L2 cache:                        24 MiB
L3 cache:                        256 MiB
NUMA node0 CPU(s):               0-23,48-71
NUMA node1 CPU(s):               24-47,72-95
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
node 0 size: 128511 MB
node 0 free: 113713 MB
node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
node 1 size: 129005 MB
node 1 free: 121583 MB
node distances:
node   0   1
  0:  10  32
  1:  32  10

server 1 load

Server 2 configuration and load:

Linux sv-marmoset318 5.3.0-62-generic #56~18.04.1-Ubuntu SMP Wed Jun 24 16:17:03 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              96
On-line CPU(s) list: 0-95
Thread(s) per core:  2
Core(s) per socket:  24
Socket(s):           2
NUMA node(s):        8
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               49
Model name:          AMD EPYC 7402 24-Core Processor
Stepping:            0
CPU MHz:             3340.149
BogoMIPS:            5589.69
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            16384K
NUMA node0 CPU(s):   0-5,48-53
NUMA node1 CPU(s):   6-11,54-59
NUMA node2 CPU(s):   12-17,60-65
NUMA node3 CPU(s):   18-23,66-71
NUMA node4 CPU(s):   24-29,72-77
NUMA node5 CPU(s):   30-35,78-83
NUMA node6 CPU(s):   36-41,84-89
NUMA node7 CPU(s):   42-47,90-95
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3 4 5 48 49 50 51 52 53
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 6 7 8 9 10 11 54 55 56 57 58 59
node 1 size: 64085 MB
node 1 free: 52924 MB
node 2 cpus: 12 13 14 15 16 17 60 61 62 63 64 65
node 2 size: 0 MB
node 2 free: 0 MB
node 3 cpus: 18 19 20 21 22 23 66 67 68 69 70 71
node 3 size: 0 MB
node 3 free: 0 MB
node 4 cpus: 24 25 26 27 28 29 72 73 74 75 76 77
node 4 size: 0 MB
node 4 free: 0 MB
node 5 cpus: 30 31 32 33 34 35 78 79 80 81 82 83
node 5 size: 64489 MB
node 5 free: 43644 MB
node 6 cpus: 36 37 38 39 40 41 84 85 86 87 88 89
node 6 size: 0 MB
node 6 free: 0 MB
node 7 cpus: 42 43 44 45 46 47 90 91 92 93 94 95
node 7 size: 0 MB
node 7 free: 0 MB
node distances:
node   0   1   2   3   4   5   6   7
  0:  10  12  12  12  32  32  32  32
  1:  12  10  12  12  32  32  32  32
  2:  12  12  10  12  32  32  32  32
  3:  12  12  12  10  32  32  32  32
  4:  32  32  32  32  10  12  12  12
  5:  32  32  32  32  12  10  12  12
  6:  32  32  32  32  12  12  10  12
  7:  32  32  32  32  12  12  12  10

server 2 load

Because of that I believe they have different response times as working backends, about 8ms on server1 and 4-5ms on second server.

Is this issue because of numa misconfiguration ? How can I achieve even utilization on server1 like on second server ?

EDIT: Since that tasks are uwsgi processes, I can set cpu binding for them in uwsgi config and get the result I want. But behavior I described still seems strange to me.

Score:0
mg flag

After some research I think I found the solution in case it will help anyone else. The so called Receive Packet Steering helped. Related documentation and further infor can be found here: kernel doc on network scaling

helpful article

redhat docs on subject

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.