I have a set of machines identical in hardware, and almost identical in softare setup. But one of them is filling up /var/log/messages
with:
Jun 16 09:41:37 h0stname kernel: pciehp 10000:00:00.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10082 msec ago)
Jun 16 09:41:37 h0stname kernel: pciehp 10000:00:01.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10082 msec ago)
Jun 16 09:41:47 h0stname kernel: pciehp 10000:00:00.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10080 msec ago)
Jun 16 09:41:47 h0stname kernel: pciehp 10000:00:01.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10080 msec ago)
Jun 16 09:41:57 h0stname kernel: pciehp 10000:00:00.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10076 msec ago)
Jun 16 09:41:57 h0stname kernel: pciehp 10000:00:01.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10076 msec ago)
Jun 16 09:42:07 h0stname kernel: pciehp 10000:00:00.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10082 msec ago)
Jun 16 09:42:07 h0stname kernel: pciehp 10000:00:01.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10082 msec ago)
Jun 16 09:42:17 h0stname kernel: pciehp 10000:00:00.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10081 msec ago)
Jun 16 09:42:17 h0stname kernel: pciehp 10000:00:01.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10081 msec ago)
Jun 16 09:42:28 h0stname kernel: pciehp 10000:00:00.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10074 msec ago)
Jun 16 09:42:28 h0stname kernel: pciehp 10000:00:01.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10074 msec ago)
Jun 16 09:42:38 h0stname kernel: pciehp 10000:00:00.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10083 msec ago)
Jun 16 09:42:38 h0stname kernel: pciehp 10000:00:01.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10083 msec ago)
Jun 16 09:42:48 h0stname kernel: pciehp 10000:00:00.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10082 msec ago)
Jun 16 09:42:48 h0stname kernel: pciehp 10000:00:01.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10082 msec ago)
Jun 16 09:42:58 h0stname kernel: pciehp 10000:00:00.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10081 msec ago)
Jun 16 09:42:58 h0stname kernel: pciehp 10000:00:01.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10081 msec ago)
Jun 16 09:43:08 h0stname kernel: pciehp 10000:00:00.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10069 msec ago)
Jun 16 09:43:08 h0stname kernel: pciehp 10000:00:01.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10069 msec ago)
Jun 16 09:43:18 h0stname kernel: pciehp 10000:00:00.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10079 msec ago)
Jun 16 09:43:18 h0stname kernel: pciehp 10000:00:01.0:pcie04: Timeout on hotplug command 0x13f8 (issued 10079 msec ago)
Presumably a piece of hardware is not too happy. How do I proceed to find out which exact piece of hardware is causing the complaints? Everything seems to be working as except from a known faulty disk in the RAID. Normally I would start disconnecting stuff to narrow it down, but I only have SSH available at the moment, and the hardware table is huge.
All I know is that it pertains to the PCI bridge, as lspci
lists the corresponding address:
10000:00:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 04)
10000:00:01.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port B (rev 04)
Running Centos 7, kernel 3.10.0-693.21.1.el7.x86_64