Score:0

Infiniband adapter down

cd flag

edit: On CentOS 8.5, tried with Mellanox driver 4.9-4.1.7.0 (legacy) and 5.5-1.0.3.2:

I am not able to get my Infiniband adapter working. The output of ibstat states that it is down:

    CA 'mlx5_0'
        CA type: MT4123
        Number of ports: 1
        Firmware version: 20.31.1014
        Hardware version: 0
        Node GUID: 0xb8cef60300a7fbbc
        System image GUID: 0xb8cef60300a7fbbc
        Port 1:
            State: Down
            Physical state: Disabled
            Rate: 10
            Base lid: 65535
            LMC: 0
            SM lid: 0
            Capability mask: 0x2651e848
            Port GUID: 0xb8cef60300a7fbbc
            Link layer: InfiniBand

And mlxlink -d mlx5_0 outputs:

Operational Info
----------------
State                           : Disable
Physical state                  : ETH_AN_FSM_ENABLE
Speed                           : N/A
Width                           : N/A
FEC                             : N/A
Loopback Mode                   : N/A
Auto Negotiation                : ON

Supported Info
--------------
Enabled Link Speed              : 0x00000075 (HDR,EDR,FDR,QDR,SDR)
Supported Cable Speed           : 0x00000007 (QDR,DDR,SDR)

Troubleshooting Info
--------------------
Status Opcode                   : 1036
Group Opcode                    : MNG FW
Recommendation                  : Connected wrong module type. Change to a different module type.

So here I have a troubleshooting info, I just dont understand it. I am pretty sure the cable is connected, could it be some incompatibilities between Connect-X 3 (where opensm service runs) and Connect-X 6 adapters?

edit:

The adapters are connected by a Mellanox SX6012 switch.

The output of ibcheckstate -v is given in the following. Port 1 is the node with opensm running, the port of the new node with the ConnectX-6 adapter is missing.

# Checking Switch: nodeguid 0x248a070300ccc140
Node check lid 2:  OK 
Port check lid 2 port 1:  OK 
Port check lid 2 port 2:  OK 
Port check lid 2 port 3:  OK 
Port check lid 2 port 4:  OK 
Port check lid 2 port 5:  OK 

# Checking Ca: nodeguid 0x0cc47affff5fb364
Node check lid 4:  OK 
Port check lid 4 port 1:  OK 

# Checking Ca: nodeguid 0x0cc47affff5fb8e4
Node check lid 6:  OK 
Port check lid 6 port 1:  OK 

# Checking Ca: nodeguid 0x0cc47affff5fb4c4
Node check lid 5:  OK 
Port check lid 5 port 1:  OK 

# Checking Ca: nodeguid 0x0cc47affff5fb89c
Node check lid 3:  OK 
Port check lid 3 port 1:  OK 

# Checking Ca: nodeguid 0x248a070300f97f50
Node check lid 1:  OK 
Port check lid 1 port 1:  OK 

*** WARNING ***: this command is deprecated

## Summary: 6 nodes checked, 0 bad nodes found
##          10 ports checked, 0 ports with bad state found

The cable has worked at least with a ConnectX-4 adapter.

br flag
Did this ever work? if so what's changed? if it's connected to an IB switch what's the status on that port? Also what are you doing about that 'connected wrong module type' message?
Holger avatar
cd flag
So far it has not yet worked, the Connect-X 6 adapter belongs to a new node I want to install. I have added the output of ibcheckstate -v to the question, the new adapter is missing completely. The 'connected wrong module type' message is why I ask for incompatibilities.
Holger avatar
cd flag
As I have added also, the cable has worked for a ConnectX-4 adapter.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.