I'm managing about ninety Dell servers distributed worldwide, currently in the process of being upgraded from Ubuntu 16.04LTS xenial to Ubuntu 20.04LTS focal.
Each server's power supply is protected by an APC SmartUPS connected via USB and monitored by nut in order to shut down cleanly in case of power failure.
This works quite well on the servers still running xenial.
On those running focal, there's a frequent problem with the USB connection to the UPS not coming up after a reboot. Specifically:
During system startup, the kernel does not detect the UPS and instead complains in /var/log/syslog:
kernel: [ 2.239216] usb usb1-port4: couldn't allocate usb_device
upsmon
periodically complains in syslog:
upsmon[1105]: Poll UPS [ups@localhost] failed - Driver not connected
The SmartUPS does not show up in the output of lsusb
:
a-schmidt@somewhere:~$ lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 004: ID 1604:10c0 Tascam
Bus 001 Device 003: ID 1604:10c0 Tascam
Bus 001 Device 002: ID 1604:10c0 Tascam
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Running sudo upsdrvctl start
manually reports it cannot find the device:
a-schmidt@somewhere:~$ sudo upsdrvctl start
Network UPS Tools - UPS driver controller 2.7.4
Network UPS Tools - Generic HID driver 0.41 (2.7.4)
USB communication driver 0.33
No matching HID UPS found
Driver failed to start (exit status=1)
Doing a USB bus reset via the command sequence:
echo '0000:00:14.0' | sudo tee /sys/bus/pci/drivers/xhci_hcd/unbind
echo '0000:00:14.0' | sudo tee /sys/bus/pci/drivers/xhci_hcd/bind
helps every time. The UPS is detected, as witnessed by the /var/log/syslog message:
kernel: [20025.662161] usb 1-4: new full-speed USB device number 2 using xhci_hcd
kernel: [20025.838902] usb 1-4: New USB device found, idVendor=051d, idProduct=0003, bcdDevice= 0.01
kernel: [20025.838907] usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
kernel: [20025.838910] usb 1-4: Product: Smart-UPS_1500 FW:UPS 04.1 / ID=1018
kernel: [20025.838913] usb 1-4: Manufacturer: American Power Conversion
It shows up in lsusb
, and upsdrvctl start
connects successfully:
a-schmidt@somewhere:~$ lsusb
Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 001 Device 002: ID 051d:0002 American Power Conversion Uninterruptible Power Supply
Bus 001 Device 005: ID 1604:10c0 Tascam
Bus 001 Device 004: ID 1604:10c0 Tascam
Bus 001 Device 003: ID 1604:10c0 Tascam
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
a-schmidt@somewhere:~$ sudo upsdrvctl start
Network UPS Tools - UPS driver controller 2.7.4
Network UPS Tools - Generic HID driver 0.41 (2.7.4)
USB communication driver 0.33
Using subdriver: APC HID 0.96
upsmon
stops complaining and everything is fine again.
What could be the reason for this behaviour and how can I fix it?