tl;dr:
- Keyboard & mouse don't work with certain kernels.
- Not asking for a workaround, I already have that.
- Using
git bisect
, I have identified the exact commit in the Ubuntu kernel repository where my input devices stop working.
- What do I do next, given that the breakage does not occur in more recent upstream kernels?
My USB input devices:
- Logitech G19 wired keyboard
- Logitech G502 wired mouse
- Sharkoon keyboard (very basic, no keyboard lighting, no display, no special extra buttons)
Ubuntu version: 21.10
Normal (expected) functionality:
- In Grub:
- Keyboard & mouse lights are on
- NumLock LED goes off and on when I press the NumLock key repeatedly
- Keyboard works (I can use arrow keys in Grub menu)
- In Gnome login screen:
- Keyboard & mouse lights are on
- NumLock LED goes off and on when I press the NumLock key repeatedly
- Mouse pointer on screen moves when I move the mouse
- Typing works (I can type my password in the login screen)
When it's not working:
- In Grub:
- Keyboard & mouse lights are on
- NumLock LED goes off and on when I press the NumLock key repeatedly
- Keyboard works (I can use arrow keys in Grub menu)
- In Gnome login screen:
- Keyboard & mouse lights are off
- NumLock LED is off and stays off when I press the NumLock key repeatedly
- Mouse pointer on screen does not move when I move the mouse
- Typing does not work
With the above, I have a very solid scenario to test if a certain kernel works for me or not. I have installed various kernels with 3 methods:
- Using
apt
-> Ubuntu kernels, available from the Ubuntu repo
- Using the Ubuntu Mainline Kernel Installer -> pre-compiled kernels from kernel.org.
- Using Ubuntu kernels I compiled myself from git://kernel.ubuntu.com/ubuntu/ubuntu-impish.git. I used
git bisect
to checkout different commits and then build each of those, so that I could find the exact commit where keyboard&mouse stop working.
- Working kernels, tested:
- 5.13.0-051300-generic (UKMI)
- 5.13.0-19-generic (apt)
- 5.13.0-20-generic (apt)
- 5.13.0-21-generic (apt)
- 5.13.0-22-generic (apt)
- Ubuntu-5.13.0-22.22-0-g3ab15e228151 (compiled)
- Ubuntu-5.13.0-22.22-317-g398351230dab (compiled)
- Ubuntu-5.13.0-22.22-356-g8ac4e2604dae (compiled)
- Ubuntu-5.13.0-22.22-376-gfab6fb5e61e1 (compiled)
- Ubuntu-5.13.0-22.22-386-gce5ff9b36bc3 (compiled)
- 5.16.11-051611-generic (UMKI)
- Failing kernels, tested:
- Ubuntu-5.13.0-22.22-387-g0fc979747dec (compiled)
- Ubuntu-5.13.0-22.22-388-gab2802ea6621 (compiled)
- Ubuntu-5.13.0-22.22-391-ge24e59fa409c (compiled)
- Ubuntu-5.13.0-22.22-396-gc3d35f3acc3a (compiled)
- Ubuntu-5.13.0-22.22-475-g79b62d0bba89 (compiled)
- Ubuntu-5.13.0-23.23-0-gb188ba567fc9 (compiled)
- 5.13.0-23-generic (apt)
- 5.13.0-25-generic (apt)
- 5.13.0-27-generic (apt)
- 5.13.0-28-generic (apt)
- 5.13.0-30-generic (apt)
Kernel 5.13.0-22 is the latest Ubuntu kernel provided via apt
that works for me, so I have pinned that version to prevent it from automatic upgrades. How I did that exactly, is outside of the scope of my question.
5.13.0-23 is the first Ubuntu kernel that breaks keyboard&mouse for me, so I know that the commit that breaks it, must be somewhere between 5.13.0-22 and 5.13.0-23. I used git bisect
to identify the exact commit and I found it. This meant running git bisect
, compiling&installing the kernel, reboot, test if input devices work, and then do git bisect good
or git bisect bad
, according to the test result. Each compilation took about 22min, so you can imagine that it took me quite some time!
The exact commit where my input devices stop working, is Ubuntu-5.13.0-22.22-387-g0fc979747dec
. It contains this change:
xhci: Fix command ring pointer corruption while aborting a command
BugLink: https://bugs.launchpad.net/bugs/1951880
commit ff0e50d3564f33b7f4b35cadeabd951d66cfc570 upstream.
The command ring pointer is located at [6:63] bits of the command
ring control register (CRCR). All the control bits like command stop,
abort are located at [0:3] bits. While aborting a command, we read the
CRCR and set the abort bit and write to the CRCR. The read will always
give command ring pointer as all zeros. So we essentially write only
the control bits. Since we split the 64 bit write into two 32 bit writes,
there is a possibility of xHC command ring stopped before the upper
dword (all zeros) is written. If that happens, xHC updates the upper
dword of its internal command ring pointer with all zeros. Next time,
when the command ring is restarted, we see xHC memory access failures.
Fix this issue by only writing to the lower dword of CRCR where all
control bits are located.
Cc: [email protected]
Signed-off-by: Pavankumar Kondeti <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
Signed-off-by: Stefan Bader <[email protected]>
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 5b54a36..5a96f3e 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -366,16 +366,22 @@ static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci,
/* Must be called with xhci->lock held, releases and aquires lock back */
static int xhci_abort_cmd_ring(struct xhci_hcd *xhci, unsigned long flags)
{
- u64 temp_64;
+ u32 temp_32;
int ret;
xhci_dbg(xhci, "Abort command ring\n");
reinit_completion(&xhci->cmd_ring_stop_completion);
- temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
- xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
- &xhci->op_regs->cmd_ring);
+ /*
+ * The control bits like command stop, abort are located in lower
+ * dword of the command ring control register. Limit the write
+ * to the lower dword to avoid corrupting the command ring pointer
+ * in case if the command ring is stopped by the time upper dword
+ * is written.
+ */
+ temp_32 = readl(&xhci->op_regs->cmd_ring);
+ writel(temp_32 | CMD_RING_ABORT, &xhci->op_regs->cmd_ring);
/* Section 4.6.1.2 of xHCI 1.0 spec says software should also time the
* completion of the Command Abort operation. If CRR is not negated in 5
The linked Launchpad bug does not yield anything useful, because it is not specifically about this change only.
The linked email thread between Mathias Nyman, Pavan Kondeti and youling257 is about this change, however the conversation goes over my head.
Mathias Nyman has made an update to his patch. His original change (with a bug) is already in the Ubuntu kernel, the patch where he fixed it, isn't. The patch from Mathias Nyman is in the mainline kernel as v5.16-rc3-1-g09f736aa9547
, which means it's included in the 5.16
mainline kernel. According to https://kernel.ubuntu.com/, the next Ubuntu version, Jammy Jellyfish / 22.04 LTS, will be based on the upstream 5.15
kernel, which I assume means that Ubuntu 22.04 LTS will still have a broken keyboard&mouse for me, unless Mathias Nyman's patch is added to the Ubuntu kernel.
I have asked on the #ubuntu-kernel IRC channel, but I may have asked at a time when not many people were online to see my question. Or maybe that channel just isn't very active, if I look at the log files.
I have reported a bug on Launchpad: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1963555
Is there anything else I can/should do?