Score:0

Black screen after installing CUDA, UBUNTU 20.04

us flag

Hi can anyone help me please? I got a black screen after installing nvidia cuda drivers.

UBUNTU 20.04, kernel 5.8.0-55-generic

NVIDIA-SMI 465.27
Driver Version: 465.27
CUDA Version: 11.3

20gb of RAM, 2gb Nvidia mx150, Intel Core i7-8550U.

I'm also getting this message when trying to run a model: "RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 1.96 GiB total capacity; 2.00 MiB already allocated; 9.50 MiB free; 4.00 MiB reserved in total by PyTorch) "

Bit from a LOG FILE:

[29319.635864] NVRM:The NVIDIA probe routine failed for 1 device(s).

[29319.000029] NVRM: None of the NVIDIA devices were initialized.

[29319.002993] nvidia-nvlink: Unregistered the Nvlink Core, major device number 234

[29319.635059] nvidia-nvlink: Nvlink Core is being initialized, major device number 234

[29319.635823] NVRM: This is a 64-bit BAR mapped above 4GB by the system

[29319.635823] NVRM: BIOS or the Linux kernel, but the PCI bridge

[29319.635823] NVRM: immediately upstream of this GPU does not defineblack-screen-after-
install-of-nvidia-driver-ubuntu/109312

[29319.635823] NVRM: a matching prefetchable memory window.
[29319.635824] NVRM: This may be due to a known Linux kernel bug.  Please

[29319.635824] NVRM: see the README section on 64-bit BARs for additional

[29319.635824] NVRM: information.

**********************************************************

dmesg |grep -i bridge
[    0.303414] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.339965] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-fe])
[    0.347519] PCI host bridge to bus 0000:00
[    0.368977] pci 0000:00:1c.0: PCI bridge to [bus 01]
[    0.368980] pci 0000:00:1c.0:   bridge window [io  0x4000-0x4fff]
[    0.368984] pci 0000:00:1c.0:   bridge window [mem 0x93000000-0x93ffffff]
[    0.368989] pci 0000:00:1c.0:   bridge window [mem 0x80000000-0x91ffffff 64bit pref]
[    0.369455] pci 0000:00:1c.4: PCI bridge to [bus 02]
[    0.369458] pci 0000:00:1c.4:   bridge window [io  0x3000-0x3fff]
[    0.369461] pci 0000:00:1c.4:   bridge window [mem 0x94100000-0x941fffff]
[    0.374209] pci 0000:00:1c.5: PCI bridge to [bus 03]
[    0.374214] pci 0000:00:1c.5:   bridge window [mem 0x94000000-0x940fffff]
[    0.379452] pci 0000:00:02.0: vgaarb: bridge control possible
[    0.441100] pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff80000-0xffffffff pref]: no compatible bridge window
[    0.441116] pci 0000:00:1c.0: PCI bridge to [bus 01]
[    0.441119] pci 0000:00:1c.0:   bridge window [io  0x4000-0x4fff]
[    0.441124] pci 0000:00:1c.0:   bridge window [mem 0x93000000-0x93ffffff]
[    0.441127] pci 0000:00:1c.0:   bridge window [mem 0x80000000-0x91ffffff 64bit pref]
[    0.441133] pci 0000:00:1c.4: PCI bridge to [bus 02]
[    0.441135] pci 0000:00:1c.4:   bridge window [io  0x3000-0x3fff]
[    0.441139] pci 0000:00:1c.4:   bridge window [mem 0x94100000-0x941fffff]
[    0.441146] pci 0000:00:1c.5: PCI bridge to [bus 03]
[    0.441150] pci 0000:00:1c.5:   bridge window [mem 0x94000000-0x940fffff]
[    8.398806] bridge: filtering via arp/ip/ip6tables is no longer available by default. Update your scripts to load br_netfilter if you need this.


****************
dmesg |grep BAR
[    0.348927] pci 0000:00:02.0: BAR 2: assigned to efifb
[    0.441100] pci 0000:01:00.0: can't claim BAR 6 [mem 0xfff80000-0xffffffff pref]: no compatible bridge window
[    0.441113] pci 0000:01:00.0: BAR 6: no space for [mem size 0x00080000 pref]
[    0.441114] pci 0000:01:00.0: BAR 6: failed to assign [mem size 0x00080000 pref]

*************

sudo lshw -c memory

*-memory UNCLAIMED
       description: Memory controller
       product: Sunrise Point-LP PMC
       vendor: Intel Corporation
       physical id: 1f.2
       bus info: pci@0000:00:1f.2
       version: 21
       width: 32 bits
       clock: 33MHz (30.3ns)
       capabilities: bus_master
       configuration: latency=0
       resources: memory:942ac000-942affff
cc flag
Did you scan dmesg |grep -i bridge for any messages for using pci=nocrs like PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug ? You might get problems with too much video card memory and not enough room in the lower 4GB of system memory for PCI use (the TOLUD problem). Did you have the Nvidia drivers working before trying to install CUDA? What hardware and how much memory do you have?
cc flag
Please add information to your original post, so you can use code tags and get reasonable formatting for readability. Did the pci messages ever successfully assign the BAR6 (like maybe at [mem 0xf1080000-0xf10fffff pref] like my system does)?
TonyKutunio avatar
us flag
I don't really know what does that mean: "Did the pci messages ever successfully assign the BAR6 (like maybe at [mem 0xf1080000-0xf10fffff pref] like my system does)"
cc flag
One of your comments had the failure: "...can't claim BAR 6 [mem 0xfff80000-0xffffffff pref]: no compatible bridge window", but I didn't see any later message about BAR 6 in what you had posted. Try dmesg |grep BAR and see if all the BARs eventually get assigned.
TonyKutunio avatar
us flag
Oh yeah I see it... the dmesg |grep BAR output says: BAR 6: no space for [mem size 0x00080000 pref] BAR 6: failed to assign [mem size 0x00080000 pref]
cc flag
Here's a possible solution: https://www.linuxquestions.org/questions/linux-kernel-70/kernel-fails-to-assign-memory-to-pcie-device-4175487043/
TonyKutunio avatar
us flag
for some reason it says : bash: /sys/bus/pci/devices/0000:00:01.1/remove: No such file or directory bash: /sys/bus/pci/rescan: Permission denied
TonyKutunio avatar
us flag
Is that they way to execute that command "sudo echo 1 > /sys/bus/pci/devices/0000\:00\:1c.5/remove " If lspci output is: 00:1c.5 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #6 (rev f1)
cc flag
Yes, that command looks OK. Could it be your model just ran out of memory? I set up the Nvidia driver I want (usually the latest from standard repos), and install CUDA from the .run file skipping the offer of Nvidia drivers. Avoids many problems when system/video updates occur.
TonyKutunio avatar
us flag
looks like I dont have black screen issue after the above commands... But still getting that error: RuntimeError: CUDA out of memory.. Dont know if the model ran out of memory really
TonyKutunio avatar
us flag
thought that blackscreen and run out of memory are related
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.