Score:0

How to fix a partially balanced memory configuration?

vi flag

We have an Intel Xeon Gold 6230 based server under Ubuntu 20.04.5 LTS with a specific memory configuration. It has 2 sockets with 6 memory channels in each and 8 memory slots all filled with 32G DIMM modules, so that 2 of 6 channels have 2 memory modules on them and the rest have only one, as shown here https://www.thomas-krenn.com/en/wiki/Optimize_memory_performance_of_Intel_Xeon_Scalable_systems#Dual_CPU_systems_with_16_DIMM_slots in the last column: 16 DIMMs (8 per CPU).

This leads to fragmentation of the physical address space of each NUMA node into 2 different regions: the lower 3/4 of addresses are interlaced among 6 channels, while the upper 1/4 is interlaced between only 2.

We became aware of it when we tried to employ large pages for our calculations and obtained 2 times slowdown instead of expected speedup with the number of threads >= 12, because for some reason large pages tend to be allocated in that deficient upper 1/4 of physical addresses.

I tried to exclude the regions by

GRUB_CMDLINE_LINUX_DEFAULT="memmap=0x1000000000\$0x3040000000 memmap=0x1000000000\$0x7040000000"

in /etc/default/grub, but the server simply failed to boot with these arguments.

So the question: is there a way to prevent the OS from using that deficient range of physical addresses by marking them reserved, or creating a custom NUMA node for them or whatever? Except removing the extra 4 DIMM modules, which would be kinda trivial solution :)

Below is the output of dmidecode --type 17 | grep '^Handle\|Bank Locator' and dmidecode --type 20 | grep 'Handle\|ing Address'

Handle 0x0010, DMI type 17, 84 bytes
        Bank Locator: P0_Node0_Channel0_Dimm0
Handle 0x0011, DMI type 17, 84 bytes
        Bank Locator: P0_Node0_Channel0_Dimm1
Handle 0x0012, DMI type 17, 84 bytes
        Bank Locator: P0_Node0_Channel1_Dimm0
Handle 0x0013, DMI type 17, 84 bytes
        Bank Locator: P0_Node0_Channel2_Dimm0
Handle 0x0014, DMI type 17, 84 bytes
        Bank Locator: P0_Node1_Channel0_Dimm0
Handle 0x0015, DMI type 17, 84 bytes
        Bank Locator: P0_Node1_Channel0_Dimm1
Handle 0x0016, DMI type 17, 84 bytes
        Bank Locator: P0_Node1_Channel1_Dimm0
Handle 0x0017, DMI type 17, 84 bytes
        Bank Locator: P0_Node1_Channel2_Dimm0
Handle 0x0018, DMI type 17, 84 bytes
        Bank Locator: P1_Node0_Channel0_Dimm0
Handle 0x0019, DMI type 17, 84 bytes
        Bank Locator: P1_Node0_Channel0_Dimm1
Handle 0x001A, DMI type 17, 84 bytes
        Bank Locator: P1_Node0_Channel1_Dimm0
Handle 0x001B, DMI type 17, 84 bytes
        Bank Locator: P1_Node0_Channel2_Dimm0
Handle 0x001C, DMI type 17, 84 bytes
        Bank Locator: P1_Node1_Channel0_Dimm0
Handle 0x001D, DMI type 17, 84 bytes
        Bank Locator: P1_Node1_Channel0_Dimm1
Handle 0x001E, DMI type 17, 84 bytes
        Bank Locator: P1_Node1_Channel1_Dimm0
Handle 0x001F, DMI type 17, 84 bytes
        Bank Locator: P1_Node1_Channel2_Dimm0

Handle 0x0021, DMI type 20, 35 bytes
    Starting Address: 0x00000000000
    Ending Address: 0x0007FFFFFFF
    Physical Device Handle: 0x0010
    Memory Array Mapped Address Handle: 0x0020
Handle 0x0022, DMI type 20, 35 bytes
    Starting Address: 0x00000000000
    Ending Address: 0x0007FFFFFFF
    Physical Device Handle: 0x0011
    Memory Array Mapped Address Handle: 0x0020
Handle 0x0023, DMI type 20, 35 bytes
    Starting Address: 0x00000000000
    Ending Address: 0x0007FFFFFFF
    Physical Device Handle: 0x0012
    Memory Array Mapped Address Handle: 0x0020
Handle 0x0024, DMI type 20, 35 bytes
    Starting Address: 0x00000000000
    Ending Address: 0x0007FFFFFFF
    Physical Device Handle: 0x0013
    Memory Array Mapped Address Handle: 0x0020
Handle 0x0025, DMI type 20, 35 bytes
    Starting Address: 0x00000000000
    Ending Address: 0x0007FFFFFFF
    Physical Device Handle: 0x0014
    Memory Array Mapped Address Handle: 0x0020
Handle 0x0026, DMI type 20, 35 bytes
    Starting Address: 0x00000000000
    Ending Address: 0x0007FFFFFFF
    Physical Device Handle: 0x0015
    Memory Array Mapped Address Handle: 0x0020
Handle 0x0027, DMI type 20, 35 bytes
    Starting Address: 0x00000000000
    Ending Address: 0x0007FFFFFFF
    Physical Device Handle: 0x0016
    Memory Array Mapped Address Handle: 0x0020
Handle 0x0028, DMI type 20, 35 bytes
    Starting Address: 0x00000000000
    Ending Address: 0x0007FFFFFFF
    Physical Device Handle: 0x0017
    Memory Array Mapped Address Handle: 0x0020
Handle 0x002A, DMI type 20, 35 bytes
    Starting Address: 0x00100000000
    Ending Address: 0x0303FFFFFFF
    Physical Device Handle: 0x0010
    Memory Array Mapped Address Handle: 0x0029
Handle 0x002B, DMI type 20, 35 bytes
    Starting Address: 0x00100000000
    Ending Address: 0x0303FFFFFFF
    Physical Device Handle: 0x0011
    Memory Array Mapped Address Handle: 0x0029
Handle 0x002C, DMI type 20, 35 bytes
    Starting Address: 0x00100000000
    Ending Address: 0x0303FFFFFFF
    Physical Device Handle: 0x0012
    Memory Array Mapped Address Handle: 0x0029
Handle 0x002D, DMI type 20, 35 bytes
    Starting Address: 0x00100000000
    Ending Address: 0x0303FFFFFFF
    Physical Device Handle: 0x0013
    Memory Array Mapped Address Handle: 0x0029
Handle 0x002E, DMI type 20, 35 bytes
    Starting Address: 0x00100000000
    Ending Address: 0x0303FFFFFFF
    Physical Device Handle: 0x0014
    Memory Array Mapped Address Handle: 0x0029
Handle 0x002F, DMI type 20, 35 bytes
    Starting Address: 0x00100000000
    Ending Address: 0x0303FFFFFFF
    Physical Device Handle: 0x0015
    Memory Array Mapped Address Handle: 0x0029
Handle 0x0030, DMI type 20, 35 bytes
    Starting Address: 0x00100000000
    Ending Address: 0x0303FFFFFFF
    Physical Device Handle: 0x0016
    Memory Array Mapped Address Handle: 0x0029
Handle 0x0031, DMI type 20, 35 bytes
    Starting Address: 0x00100000000
    Ending Address: 0x0303FFFFFFF
    Physical Device Handle: 0x0017
    Memory Array Mapped Address Handle: 0x0029
Handle 0x0033, DMI type 20, 35 bytes
    Starting Address: 0x03040000000
    Ending Address: 0x0403FFFFFFF
    Physical Device Handle: 0x0010
    Memory Array Mapped Address Handle: 0x0032
Handle 0x0034, DMI type 20, 35 bytes
    Starting Address: 0x03040000000
    Ending Address: 0x0403FFFFFFF
    Physical Device Handle: 0x0011
    Memory Array Mapped Address Handle: 0x0032
Handle 0x0035, DMI type 20, 35 bytes
    Starting Address: 0x03040000000
    Ending Address: 0x0403FFFFFFF
    Physical Device Handle: 0x0014
    Memory Array Mapped Address Handle: 0x0032
Handle 0x0036, DMI type 20, 35 bytes
    Starting Address: 0x03040000000
    Ending Address: 0x0403FFFFFFF
    Physical Device Handle: 0x0015
    Memory Array Mapped Address Handle: 0x0032
Handle 0x0038, DMI type 20, 35 bytes
    Starting Address: 0x04040000000
    Ending Address: 0x0703FFFFFFF
    Physical Device Handle: 0x0018
    Memory Array Mapped Address Handle: 0x0037
Handle 0x0039, DMI type 20, 35 bytes
    Starting Address: 0x04040000000
    Ending Address: 0x0703FFFFFFF
    Physical Device Handle: 0x0019
    Memory Array Mapped Address Handle: 0x0037
Handle 0x003A, DMI type 20, 35 bytes
    Starting Address: 0x04040000000
    Ending Address: 0x0703FFFFFFF
    Physical Device Handle: 0x001A
    Memory Array Mapped Address Handle: 0x0037
Handle 0x003B, DMI type 20, 35 bytes
    Starting Address: 0x04040000000
    Ending Address: 0x0703FFFFFFF
    Physical Device Handle: 0x001B
    Memory Array Mapped Address Handle: 0x0037
Handle 0x003C, DMI type 20, 35 bytes
    Starting Address: 0x04040000000
    Ending Address: 0x0703FFFFFFF
    Physical Device Handle: 0x001C
    Memory Array Mapped Address Handle: 0x0037
Handle 0x003D, DMI type 20, 35 bytes
    Starting Address: 0x04040000000
    Ending Address: 0x0703FFFFFFF
    Physical Device Handle: 0x001D
    Memory Array Mapped Address Handle: 0x0037
Handle 0x003E, DMI type 20, 35 bytes
    Starting Address: 0x04040000000
    Ending Address: 0x0703FFFFFFF
    Physical Device Handle: 0x001E
    Memory Array Mapped Address Handle: 0x0037
Handle 0x003F, DMI type 20, 35 bytes
    Starting Address: 0x04040000000
    Ending Address: 0x0703FFFFFFF
    Physical Device Handle: 0x001F
    Memory Array Mapped Address Handle: 0x0037
Handle 0x0041, DMI type 20, 35 bytes
    Starting Address: 0x07040000000
    Ending Address: 0x0803FFFFFFF
    Physical Device Handle: 0x0018
    Memory Array Mapped Address Handle: 0x0040
Handle 0x0042, DMI type 20, 35 bytes
    Starting Address: 0x07040000000
    Ending Address: 0x0803FFFFFFF
    Physical Device Handle: 0x0019
    Memory Array Mapped Address Handle: 0x0040
Handle 0x0043, DMI type 20, 35 bytes
    Starting Address: 0x07040000000
    Ending Address: 0x0803FFFFFFF
    Physical Device Handle: 0x001C
    Memory Array Mapped Address Handle: 0x0040
Handle 0x0044, DMI type 20, 35 bytes
    Starting Address: 0x07040000000
    Ending Address: 0x0803FFFFFFF
    Physical Device Handle: 0x001D
    Memory Array Mapped Address Handle: 0x0040
guiverc avatar
cn flag
If your provided details are correct (ie. 20.04.5) then you're behind on applying security upgrades & fixes; as a fully upgraded 20.04 system reports as 20.04.6. You can refer to https://fridge.ubuntu.com/2023/03/23/ubuntu-20-04-6-lts-released/ which shows the ISO release date, but installed systems upgraded in the week(s) before that ISO release. I'd apply security fixes asap, esp. if system is online.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.