Score:0

System Freeze and Crash (18.04) when running Python-Scripts on CUDA

pt flag

When I execute python-scripts running on CUDA, my machine will freeze and then crash without displaying any error messages. The same scripts were executed without problems just 2 weeks ago, on the same device.

Details

  • Ubuntu 18.04
  • GPU GeForce GTX 2070 SUPER
  • Cuda compilation tools, release 12.0, V12.0.76 (tried other versions as well, e.g., 9 and 10.1, 10.2)
  • Driver: NVIDIA-SMI 470.161.03
  • Python 3.6.9
  • PyTorch 1.10.1+cu102
  • Crash triggered by python package sentence-transformers==2.2.2
  • Motherboard AMD-Ryzen-7-2700X

Attempts to resolve issues

  • Reinstalled CUDA and NVIDA drivers (multiple times, different versions)
  • Removed and reinstalled all python packages in virtual environment
  • Updated BIOS drivers for AMD-Ryzen-7-2700X
  • Disabled Global C-state Control in BIOS as suggested here.
  • Disabled Core Performance Boost in BIOS as suggested here.
  • Checked syslog, no trace of an error for the given time

I have no clue how to get more information about what the possible error could be. Any ideas?

Score:0
pt flag

Turns out the power supply unit was damaged and the issue went away after replacing it.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.