Ubuntu 20.04 freezes/crashes randomly - 20 mins to ~3 hrs, current NVIDIA driver - 535.54.03 (No freezing if I'm not using GPU)
I have tried different Nvidia drivers 530.30.02, 525.85 also Ubuntu 22.04.
free -h
total used free shared buff/cache available
Mem: 62Gi 6.1Gi 413Mi 1.0Gi 55Gi 54Gi
Swap: 2.0Gi 1.4Gi 597Mi
cat /proc/sys/vm/swappiness
60
sudo lshw -C memory
*-firmware
description: BIOS
vendor: American Megatrends Inc.
physical id: 0
version: 0603
date: 09/08/2022
size: 64KiB
capacity: 32MiB
capabilities: pci upgrade shadowing cdboot bootselect socketedrom edd int13floppynec int13floppytoshiba int13floppy360 int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int14serial int17printer int10video acpi usb biosbootspecification uefi
*-cache:0
description: L1 cache
physical id: c
slot: L1 - Cache
size: 768KiB
capacity: 768KiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
configuration: level=1
*-cache:1
description: L2 cache
physical id: d
slot: L2 - Cache
size: 12MiB
capacity: 12MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
configuration: level=2
*-cache:2
description: L3 cache
physical id: e
slot: L3 - Cache
size: 64MiB
capacity: 64MiB
clock: 1GHz (1.0ns)
capabilities: pipeline-burst internal write-back unified
configuration: level=3
*-memory
description: System Memory
physical id: 11
slot: System board or motherboard
size: 64GiB
*-bank:0
description: [empty]
product: Unknown
vendor: Unknown
physical id: 0
serial: Unknown
slot: DIMM 0
*-bank:1
description: DIMM Synchronous Unbuffered (Unregistered) 4800 MHz (0.2 ns)
product: F5-5600J3636D32G
vendor: Unknown
physical id: 1
serial: 00000000
slot: DIMM 1
size: 32GiB
width: 64 bits
clock: 505MHz (2.0ns)
*-bank:2
description: [empty]
product: Unknown
vendor: Unknown
physical id: 2
serial: Unknown
slot: DIMM 0
*-bank:3
description: DIMM Synchronous Unbuffered (Unregistered) 4800 MHz (0.2 ns)
product: F5-5600J3636D32G
vendor: Unknown
physical id: 3
serial: 00000000
slot: DIMM 1
size: 32GiB
width: 64 bits
clock: 505MHz (2.0ns)
JournalCTL - This is a large file, search for "BUG: Bad page state in process"
dmesg
I also tried memtester
- it didn't return any errors
I have been struggling with this for more than a week, I would really appreciate any suggestions/help. I have also gone through many similar questions on this forum.