Why can't the GPUs communicate in a multi-GPU server?

isarandi

3/10/23, 12:35 AM

This is a Dell PowerEdge r750xa server with 4 Nvidia A40 GPUs, intended for AI applications. While the GPUs work well individually, multi-GPU training jobs or indeed any multi-GPU computational workload fails where at least 2 GPUs have to exchange information, including the simpleIPC and the conjugateGradientMultiDeviceCG CUDA samples (the first one shows mismatching results, the second just hangs).

I have seen online discussions (1, 2, 3), claiming that something called the IOMMU must be turned off. I tried setting the iommu=off and intel_iommu=off Linux kernel flags but it didn't help. I checked the BIOS settings, but there is no option to turn IOMMU off in the BIOS.

0 + 0

linux

bios

dell-poweredge

hpc

gpu

Score:1

Server

isarandi

3/10/23, 12:35 AM

While there is no explicit "IOMMU off" setting in this BIOS flavour, the problem is still with the BIOS configuration.

In the BIOS, go to "Integrated Devices" and change the "Memory Mapped I/O Base" setting from the default "56TB" to "12TB". This will solve the issue. There is no need to add any extra kernel parameters.

0 + 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Why can't the GPUs communicate in a multi-GPU server?

TH: เหตุใด GPU จึงไม่สามารถสื่อสารในเซิร์ฟเวอร์หลาย GPU ได้

RO: De ce nu pot comunica GPU-urile pe un server multi-GPU?

RU: Почему графические процессоры не могут обмениваться данными на сервере с несколькими графическими процессорами?

VI: Tại sao GPU không thể giao tiếp trong máy chủ đa GPU?

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.