The megacli command hangs when I attempt to use it. Get the following in dmesg
This is running a stock Ubuntu 20.04 LTS freshly installed. I am using the userland focal binaries PPA from https://hwraid.le-vert.net/, which I've used for over a decade on Ubuntu.
The card is reported as 18:00.0 RAID bus controller: Broadcom / LSI MegaRAID 12GSAS/PCIe Secure SAS39xx
I have two of these servers and they are brand new, so this is almost certainly something hardware/driver related.
Does anyone know what is going on here or is there a more appropriate place to report this upstream?
Linux icarus 5.4.0-89-generic #100-Ubuntu SMP Fri Sep 24 14:50:10 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
[ 485.816909] Code: Bad RIP value.
[ 485.816909] RSP: 002b:00007fff81377078 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[ 485.816910] RAX: ffffffffffffffda RBX: 00000000019394c0 RCX: 00007f10c4ee150b
[ 485.816910] RDX: 0000000001939d90 RSI: 00000000c1944d01 RDI: 0000000000000003
[ 485.816911] RBP: 00007fff813770b0 R08: 0000000001939d90 R09: 00007f10c4fb6230
[ 485.816911] R10: 0000000000401392 R11: 0000000000000206 R12: 00000000004028a0
[ 485.816911] R13: 00007fff81377f30 R14: 0000000000000000 R15: 0000000000000000
[ 606.654843] INFO: task megacli.real:2164 blocked for more than 483 seconds.
[ 606.654861] Not tainted 5.4.0-89-generic #100-Ubuntu
[ 606.654873] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 606.654889] megacli.real D 0 2164 2162 0x00000000
[ 606.654891] Call Trace:
[ 606.654894] __schedule+0x2e3/0x740
[ 606.654895] schedule+0x42/0xb0
[ 606.654898] megasas_issue_blocked_cmd+0x176/0x1b0 [megaraid_sas]
[ 606.654900] ? wait_woken+0x80/0x80
[ 606.654902] megasas_mgmt_fw_ioctl+0x4b0/0x740 [megaraid_sas]
[ 606.654905] megasas_mgmt_ioctl_fw.isra.0+0x137/0x190 [megaraid_sas]
[ 606.654907] megasas_mgmt_ioctl+0x28/0x40 [megaraid_sas]
[ 606.654909] do_vfs_ioctl+0x407/0x670
[ 606.654911] ? do_user_addr_fault+0x216/0x450
[ 606.654912] ksys_ioctl+0x67/0x90
[ 606.654914] __x64_sys_ioctl+0x1a/0x20
[ 606.654915] do_syscall_64+0x57/0x190
[ 606.654916] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 606.654917] RIP: 0033:0x7fbf8cc5150b
[ 606.654919] Code: Bad RIP value.
[ 606.654920] RSP: 002b:00007ffd523d9bd8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 606.654921] RAX: ffffffffffffffda RBX: 00000000012da590 RCX: 00007fbf8cc5150b
[ 606.654921] RDX: 00000000012d4dd0 RSI: 00000000c1944d01 RDI: 0000000000000003
[ 606.654921] RBP: 00007ffd523d9c10 R08: 00000000012d4dd0 R09: 000000000000007c
[ 606.654922] R10: 00000000012bd010 R11: 0000000000000246 R12: 00000000004028a0
[ 606.654922] R13: 00007ffd523da4a0 R14: 0000000000000000 R15: 0000000000000000
[ 606.654929] INFO: task megacli.real:3594 blocked for more than 483 seconds.
[ 606.654945] Not tainted 5.4.0-89-generic #100-Ubuntu
[ 606.654957] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 606.654973] megacli.real D 0 3594 3593 0x00000004
[ 606.654974] Call Trace:
[ 606.654975] __schedule+0x2e3/0x740
[ 606.654976] schedule+0x42/0xb0
[ 606.654978] megasas_issue_blocked_cmd+0x176/0x1b0 [megaraid_sas]
[ 606.654979] ? wait_woken+0x80/0x80
[ 606.654981] megasas_mgmt_fw_ioctl+0x4b0/0x740 [megaraid_sas]
[ 606.654983] megasas_mgmt_ioctl_fw.isra.0+0x137/0x190 [megaraid_sas]
[ 606.654985] megasas_mgmt_ioctl+0x28/0x40 [megaraid_sas]
[ 606.654986] do_vfs_ioctl+0x407/0x670
[ 606.654987] ? do_user_addr_fault+0x216/0x450
[ 606.654988] ksys_ioctl+0x67/0x90
[ 606.654990] __x64_sys_ioctl+0x1a/0x20
[ 606.654991] do_syscall_64+0x57/0x190
[ 606.654991] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 606.654992] RIP: 0033:0x7fbbf145150b
[ 606.654993] Code: Bad RIP value.
[ 606.654994] RSP: 002b:00007ffd6b0ddab8 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[ 606.654994] RAX: ffffffffffffffda RBX: 00000000011ab4c0 RCX: 00007fbbf145150b
[ 606.654995] RDX: 00000000011abd90 RSI: 00000000c1944d01 RDI: 0000000000000003
[ 606.654995] RBP: 00007ffd6b0ddaf0 R08: 00000000011abd90 R09: 00007fbbf1526230
[ 606.654995] R10: 0000000000401392 R11: 0000000000000206 R12: 00000000004028a0
[ 606.654996] R13: 00007ffd6b0de970 R14: 0000000000000000 R15: 0000000000000000
The controller then resets.
[ 721.731796] megaraid_sas 0000:18:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[ 721.731855] megaraid_sas 0000:18:00.0: FW in FAULT state Fault code:0x10000 subcode:0x0 func:megasas_wait_for_outstanding_fusion
[ 721.731954] megaraid_sas 0000:18:00.0: resetting fusion adapter scsi0.
[ 721.732587] megaraid_sas 0000:18:00.0: Outstanding fastpath IOs: 0
[ 732.492151] megaraid_sas 0000:18:00.0: Waiting for FW to come to ready state
[ 750.468711] megaraid_sas 0000:18:00.0: FW now in Ready state
[ 750.468713] megaraid_sas 0000:18:00.0: FW now in Ready state
[ 750.469351] megaraid_sas 0000:18:00.0: Current firmware supports maximum commands: 5101 LDIO threshold: 0
[ 750.469353] megaraid_sas 0000:18:00.0: Performance mode :Latency
[ 750.469353] megaraid_sas 0000:18:00.0: FW supports sync cache : Yes
[ 750.469356] megaraid_sas 0000:18:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[ 751.084723] megaraid_sas 0000:18:00.0: FW supports atomic descriptor : Yes
[ 751.224730] megaraid_sas 0000:18:00.0: FW provided supportMaxExtLDs: 1 max_lds: 240
[ 751.224734] megaraid_sas 0000:18:00.0: controller type : MR(8192MB)
[ 751.224737] megaraid_sas 0000:18:00.0: Online Controller Reset(OCR) : Enabled
[ 751.224739] megaraid_sas 0000:18:00.0: Secure JBOD support : No
[ 751.224741] megaraid_sas 0000:18:00.0: NVMe passthru support : Yes
[ 751.224743] megaraid_sas 0000:18:00.0: FW provided TM TaskAbort/Reset timeout : 6 secs/60 secs
[ 751.224745] megaraid_sas 0000:18:00.0: JBOD sequence map support : Yes
[ 751.224747] megaraid_sas 0000:18:00.0: PCI Lane Margining support : Yes
[ 779.253522] megaraid_sas 0000:18:00.0: megasas_get_ld_map_info DCMD timed out, RAID map is disabled
[ 789.945799] megaraid_sas 0000:18:00.0: Waiting for FW to come to ready state
[ 806.802227] megaraid_sas 0000:18:00.0: FW now in Ready state
[ 806.802231] megaraid_sas 0000:18:00.0: FW now in Ready state
[ 806.802979] megaraid_sas 0000:18:00.0: Current firmware supports maximum commands: 5101 LDIO threshold: 0
[ 806.802980] megaraid_sas 0000:18:00.0: Performance mode :Latency
[ 806.802981] megaraid_sas 0000:18:00.0: FW supports sync cache : Yes
[ 806.802983] megaraid_sas 0000:18:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[ 807.502249] megaraid_sas 0000:18:00.0: FW supports atomic descriptor : Yes
[ 807.586262] megaraid_sas 0000:18:00.0: FW provided supportMaxExtLDs: 1 max_lds: 240
[ 807.586266] megaraid_sas 0000:18:00.0: controller type : MR(8192MB)
[ 807.586268] megaraid_sas 0000:18:00.0: Online Controller Reset(OCR) : Enabled
[ 807.586270] megaraid_sas 0000:18:00.0: Secure JBOD support : No
[ 807.586272] megaraid_sas 0000:18:00.0: NVMe passthru support : Yes
[ 807.586274] megaraid_sas 0000:18:00.0: FW provided TM TaskAbort/Reset timeout : 6 secs/60 secs
[ 807.586276] megaraid_sas 0000:18:00.0: JBOD sequence map support : Yes
[ 807.586278] megaraid_sas 0000:18:00.0: PCI Lane Margining support : Yes
[ 835.614916] megaraid_sas 0000:18:00.0: megasas_get_ld_map_info DCMD timed out, RAID map is disabled
[ 846.307158] megaraid_sas 0000:18:00.0: Waiting for FW to come to ready state
[ 864.255554] megaraid_sas 0000:18:00.0: FW now in Ready state
[ 864.255558] megaraid_sas 0000:18:00.0: FW now in Ready state
[ 864.256364] megaraid_sas 0000:18:00.0: Current firmware supports maximum commands: 5101 LDIO threshold: 0
[ 864.256365] megaraid_sas 0000:18:00.0: Performance mode :Latency
[ 864.256366] megaraid_sas 0000:18:00.0: FW supports sync cache : Yes
[ 864.256368] megaraid_sas 0000:18:00.0: megasas_disable_intr_fusion is called outbound_intr_mask:0x40000009
[ 864.899567] megaraid_sas 0000:18:00.0: FW supports atomic descriptor : Yes
[ 865.039569] megaraid_sas 0000:18:00.0: FW provided supportMaxExtLDs: 1 max_lds: 240
[ 865.039573] megaraid_sas 0000:18:00.0: controller type : MR(8192MB)
[ 865.039575] megaraid_sas 0000:18:00.0: Online Controller Reset(OCR) : Enabled
[ 865.039577] megaraid_sas 0000:18:00.0: Secure JBOD support : No
[ 865.039579] megaraid_sas 0000:18:00.0: NVMe passthru support : Yes
[ 865.039582] megaraid_sas 0000:18:00.0: FW provided TM TaskAbort/Reset timeout : 6 secs/60 secs
[ 865.039583] megaraid_sas 0000:18:00.0: JBOD sequence map support : Yes
[ 865.039585] megaraid_sas 0000:18:00.0: PCI Lane Margining support : Yes
[ 865.039589] megaraid_sas 0000:18:00.0: return -EBUSY from megasas_refire_mgmt_cmd 4304 cmd 0x5 opcode 0x10b0100
[ 865.039682] megaraid_sas 0000:18:00.0: return -EBUSY from megasas_refire_mgmt_cmd 4304 cmd 0x5 opcode 0x1010000
[ 865.039721] megaraid_sas 0000:18:00.0: return -EBUSY from megasas_refire_mgmt_cmd 4304 cmd 0x5 opcode 0x1010000
[ 865.039776] megaraid_sas 0000:18:00.0: return -EBUSY from megasas_mgmt_fw_ioctl 8301 cmd 0x5 opcode 0x1010000 cmd->cmd_status_drv 0x3
[ 865.039781] megaraid_sas 0000:18:00.0: return -EBUSY from megasas_mgmt_fw_ioctl 8301 cmd 0x5 opcode 0x10b0100 cmd->cmd_status_drv 0x3
[ 865.039786] megaraid_sas 0000:18:00.0: return -EBUSY from megasas_mgmt_fw_ioctl 8301 cmd 0x5 opcode 0x1010000 cmd->cmd_status_drv 0x3
[ 865.039830] megaraid_sas 0000:18:00.0: waiting for controller reset to finish
[ 865.095608] megaraid_sas 0000:18:00.0: megasas_enable_intr_fusion is called outbound_intr_mask:0x40000000
[ 865.096526] megaraid_sas 0000:18:00.0: Adapter is OPERATIONAL for scsi:0
[ 865.097349] megaraid_sas 0000:18:00.0: Snap dump wait time : 15
[ 865.097351] megaraid_sas 0000:18:00.0: Reset successful for scsi0.
[ 865.097959] megaraid_sas 0000:18:00.0: 999 (689378568s/0x0020/DEAD) - Fatal firmware error: Line 171 in fw\raid\utils.c
[ 865.098227] megaraid_sas 0000:18:00.0: 1002 (689378578s/0x0020/CRIT) - Controller encountered an error and was reset
[ 865.110859] megaraid_sas 0000:18:00.0: scanning for scsi0...
[ 865.111057] megaraid_sas 0000:18:00.0: 1042 (689378618s/0x0020/DEAD) - Fatal firmware error: Line 171 in fw\raid\utils.c
[ 865.111308] megaraid_sas 0000:18:00.0: 1045 (689378628s/0x0020/CRIT) - Controller encountered an error and was reset
[ 865.115092] megaraid_sas 0000:18:00.0: scanning for scsi0...
[ 865.115368] megaraid_sas 0000:18:00.0: 1085 (689378667s/0x0020/DEAD) - Fatal firmware error: Line 171 in fw\raid\utils.c
[ 865.115405] megaraid_sas 0000:18:00.0: 1088 (689378677s/0x0020/CRIT) - Controller encountered an error and was reset
[ 865.116344] megaraid_sas 0000:18:00.0: scanning for scsi0...