We have an Intel Xeon Gold 6230 based server under Ubuntu 20.04.5 LTS with a specific memory configuration. It has 2 sockets with 6 memory channels in each and 8 memory slots all filled with 32G DIMM modules, so that 2 of 6 channels have 2 memory modules on them and the rest have only one, as shown here https://www.thomas-krenn.com/en/wiki/Optimize_memory_performance_of_Intel_Xeon_Scalable_systems#Dual_CPU_systems_with_16_DIMM_slots in the last column: 16 DIMMs (8 per CPU).
This leads to fragmentation of the physical address space of each NUMA node into 2 different regions: the lower 3/4 of addresses are interlaced among 6 channels, while the upper 1/4 is interlaced between only 2.
We became aware of it when we tried to employ large pages for our calculations and obtained 2 times slowdown instead of expected speedup with the number of threads >= 12, because for some reason large pages tend to be allocated in that deficient upper 1/4 of physical addresses.
I tried to exclude the regions by
GRUB_CMDLINE_LINUX_DEFAULT="memmap=0x1000000000\$0x3040000000 memmap=0x1000000000\$0x7040000000"
in /etc/default/grub, but the server simply failed to boot with these arguments.
So the question: is there a way to prevent the OS from using that deficient range of physical addresses by marking them reserved, or creating a custom NUMA node for them or whatever? Except removing the extra 4 DIMM modules, which would be kinda trivial solution :)
Below is the output of dmidecode --type 17 | grep '^Handle\|Bank Locator' and dmidecode --type 20 | grep 'Handle\|ing Address'
Handle 0x0010, DMI type 17, 84 bytes
Bank Locator: P0_Node0_Channel0_Dimm0
Handle 0x0011, DMI type 17, 84 bytes
Bank Locator: P0_Node0_Channel0_Dimm1
Handle 0x0012, DMI type 17, 84 bytes
Bank Locator: P0_Node0_Channel1_Dimm0
Handle 0x0013, DMI type 17, 84 bytes
Bank Locator: P0_Node0_Channel2_Dimm0
Handle 0x0014, DMI type 17, 84 bytes
Bank Locator: P0_Node1_Channel0_Dimm0
Handle 0x0015, DMI type 17, 84 bytes
Bank Locator: P0_Node1_Channel0_Dimm1
Handle 0x0016, DMI type 17, 84 bytes
Bank Locator: P0_Node1_Channel1_Dimm0
Handle 0x0017, DMI type 17, 84 bytes
Bank Locator: P0_Node1_Channel2_Dimm0
Handle 0x0018, DMI type 17, 84 bytes
Bank Locator: P1_Node0_Channel0_Dimm0
Handle 0x0019, DMI type 17, 84 bytes
Bank Locator: P1_Node0_Channel0_Dimm1
Handle 0x001A, DMI type 17, 84 bytes
Bank Locator: P1_Node0_Channel1_Dimm0
Handle 0x001B, DMI type 17, 84 bytes
Bank Locator: P1_Node0_Channel2_Dimm0
Handle 0x001C, DMI type 17, 84 bytes
Bank Locator: P1_Node1_Channel0_Dimm0
Handle 0x001D, DMI type 17, 84 bytes
Bank Locator: P1_Node1_Channel0_Dimm1
Handle 0x001E, DMI type 17, 84 bytes
Bank Locator: P1_Node1_Channel1_Dimm0
Handle 0x001F, DMI type 17, 84 bytes
Bank Locator: P1_Node1_Channel2_Dimm0
Handle 0x0021, DMI type 20, 35 bytes
Starting Address: 0x00000000000
Ending Address: 0x0007FFFFFFF
Physical Device Handle: 0x0010
Memory Array Mapped Address Handle: 0x0020
Handle 0x0022, DMI type 20, 35 bytes
Starting Address: 0x00000000000
Ending Address: 0x0007FFFFFFF
Physical Device Handle: 0x0011
Memory Array Mapped Address Handle: 0x0020
Handle 0x0023, DMI type 20, 35 bytes
Starting Address: 0x00000000000
Ending Address: 0x0007FFFFFFF
Physical Device Handle: 0x0012
Memory Array Mapped Address Handle: 0x0020
Handle 0x0024, DMI type 20, 35 bytes
Starting Address: 0x00000000000
Ending Address: 0x0007FFFFFFF
Physical Device Handle: 0x0013
Memory Array Mapped Address Handle: 0x0020
Handle 0x0025, DMI type 20, 35 bytes
Starting Address: 0x00000000000
Ending Address: 0x0007FFFFFFF
Physical Device Handle: 0x0014
Memory Array Mapped Address Handle: 0x0020
Handle 0x0026, DMI type 20, 35 bytes
Starting Address: 0x00000000000
Ending Address: 0x0007FFFFFFF
Physical Device Handle: 0x0015
Memory Array Mapped Address Handle: 0x0020
Handle 0x0027, DMI type 20, 35 bytes
Starting Address: 0x00000000000
Ending Address: 0x0007FFFFFFF
Physical Device Handle: 0x0016
Memory Array Mapped Address Handle: 0x0020
Handle 0x0028, DMI type 20, 35 bytes
Starting Address: 0x00000000000
Ending Address: 0x0007FFFFFFF
Physical Device Handle: 0x0017
Memory Array Mapped Address Handle: 0x0020
Handle 0x002A, DMI type 20, 35 bytes
Starting Address: 0x00100000000
Ending Address: 0x0303FFFFFFF
Physical Device Handle: 0x0010
Memory Array Mapped Address Handle: 0x0029
Handle 0x002B, DMI type 20, 35 bytes
Starting Address: 0x00100000000
Ending Address: 0x0303FFFFFFF
Physical Device Handle: 0x0011
Memory Array Mapped Address Handle: 0x0029
Handle 0x002C, DMI type 20, 35 bytes
Starting Address: 0x00100000000
Ending Address: 0x0303FFFFFFF
Physical Device Handle: 0x0012
Memory Array Mapped Address Handle: 0x0029
Handle 0x002D, DMI type 20, 35 bytes
Starting Address: 0x00100000000
Ending Address: 0x0303FFFFFFF
Physical Device Handle: 0x0013
Memory Array Mapped Address Handle: 0x0029
Handle 0x002E, DMI type 20, 35 bytes
Starting Address: 0x00100000000
Ending Address: 0x0303FFFFFFF
Physical Device Handle: 0x0014
Memory Array Mapped Address Handle: 0x0029
Handle 0x002F, DMI type 20, 35 bytes
Starting Address: 0x00100000000
Ending Address: 0x0303FFFFFFF
Physical Device Handle: 0x0015
Memory Array Mapped Address Handle: 0x0029
Handle 0x0030, DMI type 20, 35 bytes
Starting Address: 0x00100000000
Ending Address: 0x0303FFFFFFF
Physical Device Handle: 0x0016
Memory Array Mapped Address Handle: 0x0029
Handle 0x0031, DMI type 20, 35 bytes
Starting Address: 0x00100000000
Ending Address: 0x0303FFFFFFF
Physical Device Handle: 0x0017
Memory Array Mapped Address Handle: 0x0029
Handle 0x0033, DMI type 20, 35 bytes
Starting Address: 0x03040000000
Ending Address: 0x0403FFFFFFF
Physical Device Handle: 0x0010
Memory Array Mapped Address Handle: 0x0032
Handle 0x0034, DMI type 20, 35 bytes
Starting Address: 0x03040000000
Ending Address: 0x0403FFFFFFF
Physical Device Handle: 0x0011
Memory Array Mapped Address Handle: 0x0032
Handle 0x0035, DMI type 20, 35 bytes
Starting Address: 0x03040000000
Ending Address: 0x0403FFFFFFF
Physical Device Handle: 0x0014
Memory Array Mapped Address Handle: 0x0032
Handle 0x0036, DMI type 20, 35 bytes
Starting Address: 0x03040000000
Ending Address: 0x0403FFFFFFF
Physical Device Handle: 0x0015
Memory Array Mapped Address Handle: 0x0032
Handle 0x0038, DMI type 20, 35 bytes
Starting Address: 0x04040000000
Ending Address: 0x0703FFFFFFF
Physical Device Handle: 0x0018
Memory Array Mapped Address Handle: 0x0037
Handle 0x0039, DMI type 20, 35 bytes
Starting Address: 0x04040000000
Ending Address: 0x0703FFFFFFF
Physical Device Handle: 0x0019
Memory Array Mapped Address Handle: 0x0037
Handle 0x003A, DMI type 20, 35 bytes
Starting Address: 0x04040000000
Ending Address: 0x0703FFFFFFF
Physical Device Handle: 0x001A
Memory Array Mapped Address Handle: 0x0037
Handle 0x003B, DMI type 20, 35 bytes
Starting Address: 0x04040000000
Ending Address: 0x0703FFFFFFF
Physical Device Handle: 0x001B
Memory Array Mapped Address Handle: 0x0037
Handle 0x003C, DMI type 20, 35 bytes
Starting Address: 0x04040000000
Ending Address: 0x0703FFFFFFF
Physical Device Handle: 0x001C
Memory Array Mapped Address Handle: 0x0037
Handle 0x003D, DMI type 20, 35 bytes
Starting Address: 0x04040000000
Ending Address: 0x0703FFFFFFF
Physical Device Handle: 0x001D
Memory Array Mapped Address Handle: 0x0037
Handle 0x003E, DMI type 20, 35 bytes
Starting Address: 0x04040000000
Ending Address: 0x0703FFFFFFF
Physical Device Handle: 0x001E
Memory Array Mapped Address Handle: 0x0037
Handle 0x003F, DMI type 20, 35 bytes
Starting Address: 0x04040000000
Ending Address: 0x0703FFFFFFF
Physical Device Handle: 0x001F
Memory Array Mapped Address Handle: 0x0037
Handle 0x0041, DMI type 20, 35 bytes
Starting Address: 0x07040000000
Ending Address: 0x0803FFFFFFF
Physical Device Handle: 0x0018
Memory Array Mapped Address Handle: 0x0040
Handle 0x0042, DMI type 20, 35 bytes
Starting Address: 0x07040000000
Ending Address: 0x0803FFFFFFF
Physical Device Handle: 0x0019
Memory Array Mapped Address Handle: 0x0040
Handle 0x0043, DMI type 20, 35 bytes
Starting Address: 0x07040000000
Ending Address: 0x0803FFFFFFF
Physical Device Handle: 0x001C
Memory Array Mapped Address Handle: 0x0040
Handle 0x0044, DMI type 20, 35 bytes
Starting Address: 0x07040000000
Ending Address: 0x0803FFFFFFF
Physical Device Handle: 0x001D
Memory Array Mapped Address Handle: 0x0040