Recently I noticed on two of our servers strange distribution of tasks.
Both servers are dual cpu EPYC 7402 physically the same platforms, running the same tasks, differ in numa configuration, kernel and ubuntu.
Server 1 configuration and load:
Linux sv-marmoset222 5.15.0-46-generic #49~20.04.1-Ubuntu SMP Thu Aug 4 19:15:44 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 2
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7402 24-Core Processor
Stepping: 0
CPU MHz: 2794.626
BogoMIPS: 5589.25
Virtualization: AMD-V
L1d cache: 1.5 MiB
L1i cache: 1.5 MiB
L2 cache: 24 MiB
L3 cache: 256 MiB
NUMA node0 CPU(s): 0-23,48-71
NUMA node1 CPU(s): 24-47,72-95
available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
node 0 size: 128511 MB
node 0 free: 113713 MB
node 1 cpus: 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95
node 1 size: 129005 MB
node 1 free: 121583 MB
node distances:
node 0 1
0: 10 32
1: 32 10
server 1 load
Server 2 configuration and load:
Linux sv-marmoset318 5.3.0-62-generic #56~18.04.1-Ubuntu SMP Wed Jun 24 16:17:03 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 96
On-line CPU(s) list: 0-95
Thread(s) per core: 2
Core(s) per socket: 24
Socket(s): 2
NUMA node(s): 8
Vendor ID: AuthenticAMD
CPU family: 23
Model: 49
Model name: AMD EPYC 7402 24-Core Processor
Stepping: 0
CPU MHz: 3340.149
BogoMIPS: 5589.69
Virtualization: AMD-V
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 16384K
NUMA node0 CPU(s): 0-5,48-53
NUMA node1 CPU(s): 6-11,54-59
NUMA node2 CPU(s): 12-17,60-65
NUMA node3 CPU(s): 18-23,66-71
NUMA node4 CPU(s): 24-29,72-77
NUMA node5 CPU(s): 30-35,78-83
NUMA node6 CPU(s): 36-41,84-89
NUMA node7 CPU(s): 42-47,90-95
available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3 4 5 48 49 50 51 52 53
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 6 7 8 9 10 11 54 55 56 57 58 59
node 1 size: 64085 MB
node 1 free: 52924 MB
node 2 cpus: 12 13 14 15 16 17 60 61 62 63 64 65
node 2 size: 0 MB
node 2 free: 0 MB
node 3 cpus: 18 19 20 21 22 23 66 67 68 69 70 71
node 3 size: 0 MB
node 3 free: 0 MB
node 4 cpus: 24 25 26 27 28 29 72 73 74 75 76 77
node 4 size: 0 MB
node 4 free: 0 MB
node 5 cpus: 30 31 32 33 34 35 78 79 80 81 82 83
node 5 size: 64489 MB
node 5 free: 43644 MB
node 6 cpus: 36 37 38 39 40 41 84 85 86 87 88 89
node 6 size: 0 MB
node 6 free: 0 MB
node 7 cpus: 42 43 44 45 46 47 90 91 92 93 94 95
node 7 size: 0 MB
node 7 free: 0 MB
node distances:
node 0 1 2 3 4 5 6 7
0: 10 12 12 12 32 32 32 32
1: 12 10 12 12 32 32 32 32
2: 12 12 10 12 32 32 32 32
3: 12 12 12 10 32 32 32 32
4: 32 32 32 32 10 12 12 12
5: 32 32 32 32 12 10 12 12
6: 32 32 32 32 12 12 10 12
7: 32 32 32 32 12 12 12 10
server 2 load
Because of that I believe they have different response times as working backends, about 8ms on server1 and 4-5ms on second server.
Is this issue because of numa misconfiguration ?
How can I achieve even utilization on server1 like on second server ?
EDIT: Since that tasks are uwsgi processes, I can set cpu binding for them in uwsgi config and get the result I want. But behavior I described still seems strange to me.