Score:0

Graphic freezes after changing slot of graphics card in Ubuntu 22.04

us flag

EDIT: see updates at the end for solution, title changed to reflect better the problem.

I have Ubuntu 22.04 LTS on a system with a Geforce RTX 2060 card. I have recently done some small changes in hardware (changing the graphics card from one PCI slot to another, and days after that installing some case fan), and after the latest change I am finding that the graphic output of the system randomly dies, not much time after booting. Booting is apparently fine, I can log in, start opening my browser, terminals etc as usual, and then the screen turns blue just like when there is no signal. Any attempt to open a terminal (Ctrl+Alt+F3, Ctrl+Alt+F1...) is useless, and I can only do a Alt+SysRq+REISUB to reboot the system.

Looking at the system/kernel logs, it seems that the problems start with this:

kernel: [ 1531.539086] xhci_hcd 0000:0c:00.2: Unable to change power state from D3hot to D0, device inaccessible 
kernel: [ 1531.539241] nouveau 0000:0c:00.0: timer: stalled at ffffffffffffffff
kernel: [ 1531.539244] ------------[ cut here ]------------
kernel: [ 1531.539245] nouveau 0000:0c:00.0: timeout

And later some lines like

kernel: [ 1531.599952] xhci_hcd 0000:0c:00.2: Unable to change power state from D3cold to D0, device inaccessible
kernel: [ 1531.599959] xhci_hcd 0000:0c:00.2: Controller not ready at resume -19
kernel: [ 1531.599961] xhci_hcd 0000:0c:00.2: PCI post-resume error -19!
kernel: [ 1531.599962] xhci_hcd 0000:0c:00.2: HC died; cleaning up

I have tried browsing for those messages and read that some people gets some issues after changing the card from one PCI slot to another (which I find surprising), but the funny thing is that I did change the graphics card of PCI slot about one week ago, and during this week everything was fine, and it has been only today after powering off to add a case fan and rebooting (fan was an Arctic P14 slim PWM PST, connected to an Arctic P12 PWM PST that was already installed, and that to CHA_FAN1 on the mobo which is an Asus ROG Strix X570-e) that I am having these issues.

So, I do not know if the issue is the changes in the hardware that are creating conflicts, or else whether there has been some update of the nouveau drivers that has kicked in after the last boot (I take lots of time from one boot to another, so I would have detected that only now).

Someone has some idea of what is the issue, or what should I look for in the logs to better pinpoint the problem? Thanks a lot!

** UPDATE: just tried putting back the graphics card to the previous PCI slot, and the problem appears again. So I guess that it must be something related with some recent drivers update or something like that. Anyone has some idea?

** UPDATE 2: As said in the comment to the answer by kanehekili, I think I know now the origin of the problem. The card was originally in a x16 slot, and then I changed it to another slot that is admits a x16 card but actually is a x8 slot. The documentation of the mobo very misleadingly labels the slots as PCIEX16_1 and PCIEX16_2, omitting the fact that the second slot is actually only x8. Then, surely this change triggered some issue with the drivers that persisted even after putting back the card to the x16. The problem was finally solved by installing the Nvidia "driver metapackage from nvidia-driver-530 (proprietary)" with the GUI "additional drivers" menu. I note that trying the first driver option in the menu, which is the "-open" version of 530, still gave some issues as the system would not fully recognize the card (e.g. output of nvidia-smi in terminal would give "no devices were found"). Now, apparently everything is fine again. I mark the issue as solved.

Score:1
zw flag

Using NVIDIA cards for quite some time I'd recommend to install the native Nvidia driver instead of the the default Nouveau open-source driver. Ubuntu offers a "Hardware app" that will do the job for you. Note that the native driver will not work with Wayland, you'll be using a X-Server session again. The NVIDIA driver will take over control of the card's power management, which seems to be the problem in your case.

danny_met avatar
us flag
Thanks for the answer! In the tab "additional drivers" within "software and updates" I saw many different versions, apart from the nouveau that was selected (can't put the screenshot as I am with another computer now). I guess that you mean selecting one of those other options in that menu, or in another app? Also, is there some specific nvidia driver version you would recommend for 22.04?
kanehekili avatar
zw flag
Usually one version is recommended (in brackets), you should choose this. The 470 Version or newer should work. But you named the app I was referring to.
danny_met avatar
us flag
Oh, now I have an idea of what happened. It turns out that in the motherboard, PCIe slot 1 where I had the originally the graphics card is x16. But the second slot, where I migrated it, is wired as x8 (though it admits physically x16). So, it can be that what happened is that when changing the card of slot, something in the configuration changed that creates some conflicts, and now when trying to put back the card to its original place the change can not be reversed. I guess that this could be fixed by reinstalling the nouveau drivers. Some tip as to what is the easiest way do that? Thanks!
danny_met avatar
us flag
OK, what I finally did was to install the Nvidia drivers with the GUI app. The systems works and so far seems stable, but I get some terminal messages during boot, and looking at the kernel log and other outputs something seems wrong (apparently the card is not fully recognized, output of nvidia-smi is "no devices were found"). I will browse a bit and will open other question if can't find the solution. Thanks!
kanehekili avatar
zw flag
If you search for Rtx2060 in this forum you'll encounter some questions concerning the error messages. Seems to be a kernel problem. Nvidia-smi should detect the drivers, after reboot. Check if you are running a XServer session, not a Wayland session
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.