Score:0

Dell PERC H750 megacli hangs in Ubuntu 20.04 LTS

ke flag

The megacli command hangs when I attempt to use it. Get the following in dmesg

This is running a stock Ubuntu 20.04 LTS freshly installed. I am using the userland focal binaries PPA from https://hwraid.le-vert.net/, which I've used for over a decade on Ubuntu.

The card is reported as 18:00.0 RAID bus controller: Broadcom / LSI MegaRAID 12GSAS/PCIe Secure SAS39xx

I have two of these servers and they are brand new, so this is almost certainly something hardware/driver related.

Does anyone know what is going on here or is there a more appropriate place to report this upstream?

Linux icarus 5.4.0-89-generic #100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[  485.816909] Code: Bad RIP value.
[  485.816909] RSP: 002b:00007fff81377078 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[  485.816910] RAX: ffffffffffffffda RBX: 00000000019394c0 RCX: 00007f10c4ee150b
[  485.816910] RDX: 0000000001939d90 RSI: 00000000c1944d01 RDI: 0000000000000003
[  485.816911] RBP: 00007fff813770b0 R08: 0000000001939d90 R09: 00007f10c4fb6230
[  485.816911] R10: 0000000000401392 R11: 0000000000000206 R12: 00000000004028a0
[  485.816911] R13: 00007fff81377f30 R14: 0000000000000000 R15: 0000000000000000
[  606.654843] INFO: task megacli.real:2164 blocked for more than 483 seconds.
[  606.654861]       Not tainted 5.4.0-89-generic #100-Ubuntu
[  606.654873] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  606.654889] megacli.real    D    0  2164   2162 0x00000000
[  606.654891] Call Trace:
[  606.654894]  __schedule+0x2e3/0x740
[  606.654895]  schedule+0x42/0xb0
[  606.654898]  megasas_issue_blocked_cmd+0x176/0x1b0 [megaraid_sas]
[  606.654900]  ? wait_woken+0x80/0x80
[  606.654902]  megasas_mgmt_fw_ioctl+0x4b0/0x740 [megaraid_sas]
[  606.654905]  megasas_mgmt_ioctl_fw.isra.0+0x137/0x190 [megaraid_sas]
[  606.654907]  megasas_mgmt_ioctl+0x28/0x40 [megaraid_sas]
[  606.654909]  do_vfs_ioctl+0x407/0x670
[  606.654911]  ? do_user_addr_fault+0x216/0x450
[  606.654912]  ksys_ioctl+0x67/0x90
[  606.654914]  __x64_sys_ioctl+0x1a/0x20
[  606.654915]  do_syscall_64+0x57/0x190
[  606.654916]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  606.654917] RIP: 0033:0x7fbf8cc5150b
[  606.654919] Code: Bad RIP value.
[  606.654920] RSP: 002b:00007ffd523d9bd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[  606.654921] RAX: ffffffffffffffda RBX: 00000000012da590 RCX: 00007fbf8cc5150b
[  606.654921] RDX: 00000000012d4dd0 RSI: 00000000c1944d01 RDI: 0000000000000003
[  606.654921] RBP: 00007ffd523d9c10 R08: 00000000012d4dd0 R09: 000000000000007c
[  606.654922] R10: 00000000012bd010 R11: 0000000000000246 R12: 00000000004028a0
[  606.654922] R13: 00007ffd523da4a0 R14: 0000000000000000 R15: 0000000000000000
[  606.654929] INFO: task megacli.real:3594 blocked for more than 483 seconds.
[  606.654945]       Not tainted 5.4.0-89-generic #100-Ubuntu
[  606.654957] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  606.654973] megacli.real    D    0  3594   3593 0x00000004
[  606.654974] Call Trace:
[  606.654975]  __schedule+0x2e3/0x740
[  606.654976]  schedule+0x42/0xb0
[  606.654978]  megasas_issue_blocked_cmd+0x176/0x1b0 [megaraid_sas]
[  606.654979]  ? wait_woken+0x80/0x80
[  606.654981]  megasas_mgmt_fw_ioctl+0x4b0/0x740 [megaraid_sas]
[  606.654983]  megasas_mgmt_ioctl_fw.isra.0+0x137/0x190 [megaraid_sas]
[  606.654985]  megasas_mgmt_ioctl+0x28/0x40 [megaraid_sas]
[  606.654986]  do_vfs_ioctl+0x407/0x670
[  606.654987]  ? do_user_addr_fault+0x216/0x450
[  606.654988]  ksys_ioctl+0x67/0x90
[  606.654990]  __x64_sys_ioctl+0x1a/0x20
[  606.654991]  do_syscall_64+0x57/0x190
[  606.654991]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  606.654992] RIP: 0033:0x7fbbf145150b
[  606.654993] Code: Bad RIP value.
[  606.654994] RSP: 002b:00007ffd6b0ddab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[  606.654994] RAX: ffffffffffffffda RBX: 00000000011ab4c0 RCX: 00007fbbf145150b
[  606.654995] RDX: 00000000011abd90 RSI: 00000000c1944d01 RDI: 0000000000000003
[  606.654995] RBP: 00007ffd6b0ddaf0 R08: 00000000011abd90 R09: 00007fbbf1526230
[  606.654995] R10: 0000000000401392 R11: 0000000000000206 R12: 00000000004028a0
[  606.654996] R13: 00007ffd6b0de970 R14: 0000000000000000 R15: 0000000000000000

The controller then resets.

[  721.731796] megaraid_sas 0000:18:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[  721.731855] megaraid_sas 0000:18:00.0: FW in FAULT state Fault code:0x10000 subcode:0x0 func:megasas_wait_for_outstanding_fusion
[  721.731954] megaraid_sas 0000:18:00.0: resetting fusion adapter scsi0.
[  721.732587] megaraid_sas 0000:18:00.0: Outstanding fastpath IOs: 0
[  732.492151] megaraid_sas 0000:18:00.0: Waiting for FW to come to ready state
[  750.468711] megaraid_sas 0000:18:00.0: FW now in Ready state
[  750.468713] megaraid_sas 0000:18:00.0: FW now in Ready state
[  750.469351] megaraid_sas 0000:18:00.0: Current firmware supports maximum commands: 5101       LDIO threshold: 0
[  750.469353] megaraid_sas 0000:18:00.0: Performance mode :Latency
[  750.469353] megaraid_sas 0000:18:00.0: FW supports sync cache        : Yes
[  750.469356] megaraid_sas 0000:18:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[  751.084723] megaraid_sas 0000:18:00.0: FW supports atomic descriptor : Yes
[  751.224730] megaraid_sas 0000:18:00.0: FW provided supportMaxExtLDs: 1       max_lds: 240
[  751.224734] megaraid_sas 0000:18:00.0: controller type       : MR(8192MB)
[  751.224737] megaraid_sas 0000:18:00.0: Online Controller Reset(OCR)  : Enabled
[  751.224739] megaraid_sas 0000:18:00.0: Secure JBOD support   : No
[  751.224741] megaraid_sas 0000:18:00.0: NVMe passthru support : Yes
[  751.224743] megaraid_sas 0000:18:00.0: FW provided TM TaskAbort/Reset timeout        : 6 secs/60 secs
[  751.224745] megaraid_sas 0000:18:00.0: JBOD sequence map support     : Yes
[  751.224747] megaraid_sas 0000:18:00.0: PCI Lane Margining support    : Yes
[  779.253522] megaraid_sas 0000:18:00.0: megasas_get_ld_map_info DCMD timed out, RAID map is disabled
[  789.945799] megaraid_sas 0000:18:00.0: Waiting for FW to come to ready state
[  806.802227] megaraid_sas 0000:18:00.0: FW now in Ready state
[  806.802231] megaraid_sas 0000:18:00.0: FW now in Ready state
[  806.802979] megaraid_sas 0000:18:00.0: Current firmware supports maximum commands: 5101       LDIO threshold: 0
[  806.802980] megaraid_sas 0000:18:00.0: Performance mode :Latency
[  806.802981] megaraid_sas 0000:18:00.0: FW supports sync cache        : Yes
[  806.802983] megaraid_sas 0000:18:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[  807.502249] megaraid_sas 0000:18:00.0: FW supports atomic descriptor : Yes
[  807.586262] megaraid_sas 0000:18:00.0: FW provided supportMaxExtLDs: 1       max_lds: 240
[  807.586266] megaraid_sas 0000:18:00.0: controller type       : MR(8192MB)
[  807.586268] megaraid_sas 0000:18:00.0: Online Controller Reset(OCR)  : Enabled
[  807.586270] megaraid_sas 0000:18:00.0: Secure JBOD support   : No
[  807.586272] megaraid_sas 0000:18:00.0: NVMe passthru support : Yes
[  807.586274] megaraid_sas 0000:18:00.0: FW provided TM TaskAbort/Reset timeout        : 6 secs/60 secs
[  807.586276] megaraid_sas 0000:18:00.0: JBOD sequence map support     : Yes
[  807.586278] megaraid_sas 0000:18:00.0: PCI Lane Margining support    : Yes
[  835.614916] megaraid_sas 0000:18:00.0: megasas_get_ld_map_info DCMD timed out, RAID map is disabled
[  846.307158] megaraid_sas 0000:18:00.0: Waiting for FW to come to ready state
[  864.255554] megaraid_sas 0000:18:00.0: FW now in Ready state
[  864.255558] megaraid_sas 0000:18:00.0: FW now in Ready state
[  864.256364] megaraid_sas 0000:18:00.0: Current firmware supports maximum commands: 5101       LDIO threshold: 0
[  864.256365] megaraid_sas 0000:18:00.0: Performance mode :Latency
[  864.256366] megaraid_sas 0000:18:00.0: FW supports sync cache        : Yes
[  864.256368] megaraid_sas 0000:18:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[  864.899567] megaraid_sas 0000:18:00.0: FW supports atomic descriptor : Yes
[  865.039569] megaraid_sas 0000:18:00.0: FW provided supportMaxExtLDs: 1       max_lds: 240
[  865.039573] megaraid_sas 0000:18:00.0: controller type       : MR(8192MB)
[  865.039575] megaraid_sas 0000:18:00.0: Online Controller Reset(OCR)  : Enabled
[  865.039577] megaraid_sas 0000:18:00.0: Secure JBOD support   : No
[  865.039579] megaraid_sas 0000:18:00.0: NVMe passthru support : Yes
[  865.039582] megaraid_sas 0000:18:00.0: FW provided TM TaskAbort/Reset timeout        : 6 secs/60 secs
[  865.039583] megaraid_sas 0000:18:00.0: JBOD sequence map support     : Yes
[  865.039585] megaraid_sas 0000:18:00.0: PCI Lane Margining support    : Yes
[  865.039589] megaraid_sas 0000:18:00.0: return -EBUSY from megasas_refire_mgmt_cmd 4304 cmd 0x5 opcode 0x10b0100
[  865.039682] megaraid_sas 0000:18:00.0: return -EBUSY from megasas_refire_mgmt_cmd 4304 cmd 0x5 opcode 0x1010000
[  865.039721] megaraid_sas 0000:18:00.0: return -EBUSY from megasas_refire_mgmt_cmd 4304 cmd 0x5 opcode 0x1010000
[  865.039776] megaraid_sas 0000:18:00.0: return -EBUSY from megasas_mgmt_fw_ioctl 8301 cmd 0x5 opcode 0x1010000 cmd->cmd_status_drv 0x3
[  865.039781] megaraid_sas 0000:18:00.0: return -EBUSY from megasas_mgmt_fw_ioctl 8301 cmd 0x5 opcode 0x10b0100 cmd->cmd_status_drv 0x3
[  865.039786] megaraid_sas 0000:18:00.0: return -EBUSY from megasas_mgmt_fw_ioctl 8301 cmd 0x5 opcode 0x1010000 cmd->cmd_status_drv 0x3
[  865.039830] megaraid_sas 0000:18:00.0: waiting for controller reset to finish
[  865.095608] megaraid_sas 0000:18:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
[  865.096526] megaraid_sas 0000:18:00.0: Adapter is OPERATIONAL for scsi:0
[  865.097349] megaraid_sas 0000:18:00.0: Snap dump wait time   : 15
[  865.097351] megaraid_sas 0000:18:00.0: Reset successful for scsi0.
[  865.097959] megaraid_sas 0000:18:00.0: 999 (689378568s/0x0020/DEAD) - Fatal firmware error: Line 171 in fw\raid\utils.c

[ 865.098227] megaraid_sas 0000:18:00.0: 1002 (689378578s/0x0020/CRIT) - Controller encountered an error and was reset [ 865.110859] megaraid_sas 0000:18:00.0: scanning for scsi0... [ 865.111057] megaraid_sas 0000:18:00.0: 1042 (689378618s/0x0020/DEAD) - Fatal firmware error: Line 171 in fw\raid\utils.c

[ 865.111308] megaraid_sas 0000:18:00.0: 1045 (689378628s/0x0020/CRIT) - Controller encountered an error and was reset [ 865.115092] megaraid_sas 0000:18:00.0: scanning for scsi0... [ 865.115368] megaraid_sas 0000:18:00.0: 1085 (689378667s/0x0020/DEAD) - Fatal firmware error: Line 171 in fw\raid\utils.c

[ 865.115405] megaraid_sas 0000:18:00.0: 1088 (689378677s/0x0020/CRIT) - Controller encountered an error and was reset [ 865.116344] megaraid_sas 0000:18:00.0: scanning for scsi0...

Score:1
ke flag

After research I've discovered that the hwraid.le-vert.net packages are not up to date and were causing the kernel panic. After removing them and installing the zip from LSI directly using alien the commands work fine.

https://gist.github.com/fxkraus/595ab82e07cd6f8e057d31bc0bc5e779

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.