Score:0

How to identify slot for faulty GPU card in server using UBUNTU OS commands?

fr flag

I have a question. Is it possible to identify in which slot there is a broken GPU card using the UBUNTU operating system? We have a SuperMicro GPU server in which there are about 8 GPU cards for AI computing. Every now and then we go to the server room after we get information from users/department that the card is not visible in 'nvidia-smi' command. These are generally hardware failures. Then we encounter a situation where 7 cards are working properly and unfortunately we have to identify the faulty card by trial and error by pulling it from the server. This is terribly tedious and time consuming, so I am wondering if it is possible to unambiguously identify the slot where the faulty card is located.

Thank you in advance.

Nikita Kipriyanov avatar
za flag
Are you able to determine PCI address of the faulty card?
Score:0
za flag

In general, if you are able to find out which PCI bus address this card has, you can locate the precise slot it occupies. Traverse dmidecode output and find in which slot this PCI address appears.

However, this only helps if you have confidence the PCI slot numbering in DMI is predictable and corresponds to actual physical slots on the motherboard. In brand computers (HPE, Dell, etc.) this is often the case. If the motherboard is manufactured by less reputable brand, its DMI data may be not in sync. Nevertheless, this is worth trying.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.