Score:0

Keyboard & mouse stop working with kernel Ubuntu-5.13.0-22.22-387-g0fc979747dec - xhci: Fix command ring pointer corruption while aborting a command

au flag

tl;dr:

  • Keyboard & mouse don't work with certain kernels.
  • Not asking for a workaround, I already have that.
  • Using git bisect, I have identified the exact commit in the Ubuntu kernel repository where my input devices stop working.
  • What do I do next, given that the breakage does not occur in more recent upstream kernels?

My USB input devices:

  • Logitech G19 wired keyboard
  • Logitech G502 wired mouse
  • Sharkoon keyboard (very basic, no keyboard lighting, no display, no special extra buttons)

Ubuntu version: 21.10

Normal (expected) functionality:

  • In Grub:
    • Keyboard & mouse lights are on
    • NumLock LED goes off and on when I press the NumLock key repeatedly
    • Keyboard works (I can use arrow keys in Grub menu)
  • In Gnome login screen:
    • Keyboard & mouse lights are on
    • NumLock LED goes off and on when I press the NumLock key repeatedly
    • Mouse pointer on screen moves when I move the mouse
    • Typing works (I can type my password in the login screen)

When it's not working:

  • In Grub:
    • Keyboard & mouse lights are on
    • NumLock LED goes off and on when I press the NumLock key repeatedly
    • Keyboard works (I can use arrow keys in Grub menu)
  • In Gnome login screen:
    • Keyboard & mouse lights are off
    • NumLock LED is off and stays off when I press the NumLock key repeatedly
    • Mouse pointer on screen does not move when I move the mouse
    • Typing does not work

With the above, I have a very solid scenario to test if a certain kernel works for me or not. I have installed various kernels with 3 methods:

  1. Using apt -> Ubuntu kernels, available from the Ubuntu repo
  2. Using the Ubuntu Mainline Kernel Installer -> pre-compiled kernels from kernel.org.
  3. Using Ubuntu kernels I compiled myself from git://kernel.ubuntu.com/ubuntu/ubuntu-impish.git. I used git bisect to checkout different commits and then build each of those, so that I could find the exact commit where keyboard&mouse stop working.
  • Working kernels, tested:
    • 5.13.0-051300-generic (UKMI)
    • 5.13.0-19-generic (apt)
    • 5.13.0-20-generic (apt)
    • 5.13.0-21-generic (apt)
    • 5.13.0-22-generic (apt)
    • Ubuntu-5.13.0-22.22-0-g3ab15e228151 (compiled)
    • Ubuntu-5.13.0-22.22-317-g398351230dab (compiled)
    • Ubuntu-5.13.0-22.22-356-g8ac4e2604dae (compiled)
    • Ubuntu-5.13.0-22.22-376-gfab6fb5e61e1 (compiled)
    • Ubuntu-5.13.0-22.22-386-gce5ff9b36bc3 (compiled)
    • 5.16.11-051611-generic (UMKI)
  • Failing kernels, tested:
    • Ubuntu-5.13.0-22.22-387-g0fc979747dec (compiled)
    • Ubuntu-5.13.0-22.22-388-gab2802ea6621 (compiled)
    • Ubuntu-5.13.0-22.22-391-ge24e59fa409c (compiled)
    • Ubuntu-5.13.0-22.22-396-gc3d35f3acc3a (compiled)
    • Ubuntu-5.13.0-22.22-475-g79b62d0bba89 (compiled)
    • Ubuntu-5.13.0-23.23-0-gb188ba567fc9 (compiled)
    • 5.13.0-23-generic (apt)
    • 5.13.0-25-generic (apt)
    • 5.13.0-27-generic (apt)
    • 5.13.0-28-generic (apt)
    • 5.13.0-30-generic (apt)

Kernel 5.13.0-22 is the latest Ubuntu kernel provided via apt that works for me, so I have pinned that version to prevent it from automatic upgrades. How I did that exactly, is outside of the scope of my question.

5.13.0-23 is the first Ubuntu kernel that breaks keyboard&mouse for me, so I know that the commit that breaks it, must be somewhere between 5.13.0-22 and 5.13.0-23. I used git bisect to identify the exact commit and I found it. This meant running git bisect, compiling&installing the kernel, reboot, test if input devices work, and then do git bisect good or git bisect bad, according to the test result. Each compilation took about 22min, so you can imagine that it took me quite some time!

The exact commit where my input devices stop working, is Ubuntu-5.13.0-22.22-387-g0fc979747dec. It contains this change:

xhci: Fix command ring pointer corruption while aborting a command

BugLink: https://bugs.launchpad.net/bugs/1951880

commit ff0e50d3564f33b7f4b35cadeabd951d66cfc570 upstream.

The command ring pointer is located at [6:63] bits of the command
ring control register (CRCR). All the control bits like command stop,
abort are located at [0:3] bits. While aborting a command, we read the
CRCR and set the abort bit and write to the CRCR. The read will always
give command ring pointer as all zeros. So we essentially write only
the control bits. Since we split the 64 bit write into two 32 bit writes,
there is a possibility of xHC command ring stopped before the upper
dword (all zeros) is written. If that happens, xHC updates the upper
dword of its internal command ring pointer with all zeros. Next time,
when the command ring is restarted, we see xHC memory access failures.
Fix this issue by only writing to the lower dword of CRCR where all
control bits are located.

Cc: [email protected]
Signed-off-by: Pavankumar Kondeti <[email protected]>
Signed-off-by: Mathias Nyman <[email protected]>
Link: https://lore.kernel.org/r/[email protected]
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Kamal Mostafa <[email protected]>
Signed-off-by: Stefan Bader <[email protected]>


diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index 5b54a36..5a96f3e 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -366,16 +366,22 @@ static void xhci_handle_stopped_cmd_ring(struct xhci_hcd *xhci,
 /* Must be called with xhci->lock held, releases and aquires lock back */
 static int xhci_abort_cmd_ring(struct xhci_hcd *xhci, unsigned long flags)
 {
-   u64 temp_64;
+   u32 temp_32;
    int ret;
 
    xhci_dbg(xhci, "Abort command ring\n");
 
    reinit_completion(&xhci->cmd_ring_stop_completion);
 
-   temp_64 = xhci_read_64(xhci, &xhci->op_regs->cmd_ring);
-   xhci_write_64(xhci, temp_64 | CMD_RING_ABORT,
-           &xhci->op_regs->cmd_ring);
+   /*
+    * The control bits like command stop, abort are located in lower
+    * dword of the command ring control register. Limit the write
+    * to the lower dword to avoid corrupting the command ring pointer
+    * in case if the command ring is stopped by the time upper dword
+    * is written.
+    */
+   temp_32 = readl(&xhci->op_regs->cmd_ring);
+   writel(temp_32 | CMD_RING_ABORT, &xhci->op_regs->cmd_ring);
 
    /* Section 4.6.1.2 of xHCI 1.0 spec says software should also time the
     * completion of the Command Abort operation. If CRR is not negated in 5

The linked Launchpad bug does not yield anything useful, because it is not specifically about this change only.

The linked email thread between Mathias Nyman, Pavan Kondeti and youling257 is about this change, however the conversation goes over my head.

Mathias Nyman has made an update to his patch. His original change (with a bug) is already in the Ubuntu kernel, the patch where he fixed it, isn't. The patch from Mathias Nyman is in the mainline kernel as v5.16-rc3-1-g09f736aa9547, which means it's included in the 5.16 mainline kernel. According to https://kernel.ubuntu.com/, the next Ubuntu version, Jammy Jellyfish / 22.04 LTS, will be based on the upstream 5.15 kernel, which I assume means that Ubuntu 22.04 LTS will still have a broken keyboard&mouse for me, unless Mathias Nyman's patch is added to the Ubuntu kernel.

I have asked on the #ubuntu-kernel IRC channel, but I may have asked at a time when not many people were online to see my question. Or maybe that channel just isn't very active, if I look at the log files.

I have reported a bug on Launchpad: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1963555

Is there anything else I can/should do?

Someone avatar
my flag
*quite the tone policing*: Please read our [Code of Conduct](https://askubuntu.com/conduct).
Amedee Van Gasse avatar
au flag
I’m having trouble understanding your comment. I'm not able to see how it applies to the issue in my question. Maybe it's because I am not a native English speaker, there could be a language barrier.
Someone avatar
my flag
@AnedeeVanGasse My prior comment is not referring to your issue/question, it's referring to your prior comment, which you deleted. the one which said "quite the tone policing". I'm not a native English speaker either.
David avatar
cn flag
You do not need to report a bug on an old kernel that is why there is new kernels. Also it may not be a bug at all but an issue with YOUR hardware.
Amedee Van Gasse avatar
au flag
@David please check commit `v5.16-rc3-1-g09f736aa9547` in the upstream kernel from kernel.org. It fixes a bug that was introduced in `v5.15-rc5-4-gff0e50d3564f`. _"Turns out some xHC controllers require all 64 bits in the CRCR register to be written to execute a command abort."_ --> I am affected. Upcoming Ubuntu release 22.04 will use kernel 5.15, which means it will not automatically include the kernel bugfix that I need. Is that correct?
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.