Score:0

20.04.2 locks up completely when writing to RAID 6 array

ar flag

I can reproduce the problem consistently (and in minutes quickly) but I can't find any messages in the logs that are helpful. This problem occurred with a RocketRaid 3740C HBA and the proprietary nvidia driver but now occurs with an LSI/Broadcom 9305-16i HBA and nouveau drivers. I have flashed the Broadcom card to the latest firmware and bios. The Host Bus Adapter is connected to 9 drives (of 10, RAID 6 is degraded until the replacement disk arrives). The network card is a Mellanox ConnectX3 running a 10G ethernet on fibre. Before I exchange the RocketRaid card I remember seeing the proprietary driver write to the kernel log talk about getting 20 something when expecting 18 before the crash. I can't seem to find those messages anymore though (pointers on how to find them appreciated!).

Steps to Reproduce:

Write a lot of things to disk (write speeds are > 700MB/s). For example open 3 scp sessions from another computer and write 3 files in parallel at ~250MB/s each. In less than five minutes Ubuntu screen is frozen / locked up and ssh is non-responsive. Hard reset appears to be the only option. After which mdadm thinks the array is dirty (even though the Event count is the same on all drives). mdadm assemble --force works but then the array spends a day re-syncing.

I'm about at my wits end with this. I'm considering seeing what will happen with TrueNAS or Alma Linux. I'm somewhat wondering about the motherboard too (ASRock Tachi X570). The system seems to be fine under any load that does not involve extensive writes to the array including cpu (5700x) and intense network traffic (I can repeatedly send/receive 10s of Gigabytes of network traffic and get ~70 Gbit/s bandwidth).

Edit per comment from @heynnema

$ sudo free -h
              total        used        free      shared  buff/cache   available
Mem:           62Gi        12Gi       442Mi       372Mi        50Gi        49Gi
Swap:         975Mi        44Mi       931Mi
sudo sysctl vm.swappiness 
vm.swappiness = 60
phil@omni:~$ sudo dmidecode -s bios-version
P4.30
Tasks: 428 total,   2 running, 426 sleeping,   0 stopped,   0 zombie
%Cpu(s): 34.8 us,  2.0 sy,  0.0 ni, 61.1 id,  0.0 wa,  0.0 hi,  2.0 si,  0.0 st
MiB Mem :  64242.9 total,   1192.4 free,  14388.3 used,  48662.3 buff/cache
MiB Swap:    976.0 total,    915.5 free,     60.5 used.  48780.6 avail Mem 

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                  
  15919 fooo      20   0 4083880   3.6g  12520 S 312.5   5.7  77:36.68 chia                                                                                                                                                                     
  15560 fooo      20   0 4083904   3.6g  12544 S  93.8   5.7  77:43.99 chia                                                                                                                                                                     
   4764 root      20   0       0      0      0 S  18.8   0.0  93:17.25 md0_raid6                                                                                                                                                                
   1375 unifi     20   0 4028748 180588  21888 S   6.2   0.3   0:04.47 launcher                                                                                                                                                                 
   2154 unifi     20   0 1078716 132904  39776 S   6.2   0.2   0:25.11 mongod                                                                                                                                                                   
   4776 root      20   0       0      0      0 R   6.2   0.0  18:39.73 md0_resync                                                                                                                                                               
  15419 root      20   0       0      0      0 I   6.2   0.0   0:01.07 kworker/0:1-events                                                                                                                                                       
      1 root      20   0  168296  11728   7896 S   0.0   0.0   0:01.02 systemd                                                                                                                                                                  
      2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthreadd                                                                                                                                                                 
      3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp                                                                                                                                                                   
      4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp                                                                                                                                                               
      6 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 kworker/0:0H-kblockd                                                                                                                                                     
      9 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 mm_percpu_wq                                                                                                                                                             
     10 root      20   0       0      0      0 S   0.0   0.0   0:06.43 ksoftirqd/0                                                                                                                                                              
     11 root      20   0       0      0      0 I   0.0   0.0   0:04.24 rcu_sched                                                                                                                                                                
     12 root      rt   0       0      0      0 S   0.0   0.0   0:00.02 migration/0                                                                                                                                                              
     13 root     -51   0       0      0      0 S   0.0   0.0   0:00.00 idle_inject/0 
cat /etc/fstab
# /etc/fstab: static file system information.
#
# Use 'blkid' to print the universally unique identifier for a
# device; this may be used with UUID= as a more robust way to name devices
# that works even if disks are added and removed. See fstab(5).
#
# <file system> <mount point>   <type>  <options>       <dump>  <pass>
/dev/mapper/vgubuntu-root /               ext4    errors=remount-ro 0       1
# /boot/efi was on /dev/nvme0n1p1 during installation
UUID=3C3E-4180  /boot/efi       vfat    umask=0077      0       1
/dev/mapper/vgubuntu-swap_1 none            swap    sw              0       0
#192.168.1.192:/storage     /storage  nfs  defaults 0 0 
UUID=ddc550d2-7f93-4ecf-ac2e-d754c5eee6c9 /storage xfs defaults 0 0 
UUID=BCB65C49B65C05F4 /var/ExChia1 ntfs defaults 0 0
UUID=3A10-3FE7 /var/ExChia4 exfat defaults 0 0
UUID=0EF0-7586 /var/ExChia5 exfat defaults 0 0 
UUID=3837-E26A /var/ExChia6 exfat defaults 0 0
UUID=73338b75-d356-4e7f-9757-948f1078f04e /var/ExChia13 xfs defaults 0 0
heynnema avatar
ru flag
Edit your question and show me `free -h` and `sysctl vm.swappiness` and `sudo dmidecode -s bios-version` and `top`. Start comments to me with @heynnema or I'll miss them.
liels avatar
ar flag
@heynnema, edits per request.
heynnema avatar
ru flag
Thanks for the info. Show me `cat /etc/fstab`. Have you ever run `memtest` on this configuration? What is your boot/system disk?
liels avatar
ar flag
@heynnema, fstab is above. The boot/system disk is a 1TB NVMe firecuda 510. Running memtest now. Good thought (ages ago I used to burn in new builds with a shakedown hardware error detection suite that VAlinux wrote for their systems; either hardware is better or I'm lazier or both).
heynnema avatar
ru flag
Do you have wiggle room to increase the /dev/mapper/vgubuntu-swap_1 swap partition, or switch to a /swapfile?
liels avatar
ar flag
@heynnema yes, I can increase the swap file size, by quite a bit. Without re-arranging hardware I could probably do maybe 300 or 400G or the used-to-be recommended 2xRAM or 128G in this case. What do you recommend? If RAM starvation is causing the lockup I'd be willing to buy another pair of DIMMs and go full 128G. FWIW memtester is processing 30G of used-to-be-free memory and so far everything is fine.
heynnema avatar
ru flag
Bump swap to 4G. You don't have a swap file, you have a swap partition. You'll have to use LVM commands to do the job. Also, do you know how to set vm.swappiness=10?
liels avatar
ar flag
@heynnema. Ok, I think I understand your hypothesis for what might be going wrong. swappiness is now 10 and vfs_cache_pressure is 100 (which is presumably what we want). lvresize is cowardly not letting me mess with the mounted root filesystem; I'll work it from a usb boot do that tomorrow after the re-sync is finished, run the Memtest86+ and then test the RAID write again.
heynnema avatar
ru flag
On a live system, you can disable swap with the `swapoff -a` command, then use `lvresize` to extend /dev/mapper/vgubuntu-swap_1 to 4G, then `swapon -a`.
liels avatar
ar flag
@heynnema. It didn't solve the problem unfortunately. With swappiness at 10 and 4gb of swap space I was able to do one scp at 250MB/s successfully (about 100GB of transfer). Swap wasn't used. I did 2 successfully (~ 500MB/s) and swap got up to 512bytes or so. I was about to try 3 streams thinking maybe you solved it. Machine froze processing 2 streams when I was about to start the third. Swap was about 1536 bytes at that point. The array is resyncing again >.<. I will migrate some processes off of that machine and run the memtest86+ and see what happens.
heynnema avatar
ru flag
Didn't we run `memtest` earlier in this process? Show me `swapon -s`. You can leave vm.swappiness at 10. Set vfs_cache_pressure back to default.
liels avatar
ar flag
@heynnema Yes, I ran memtester for the memory that was free at the time ($sudo memtester 30G 3), which passed, but not memtest86+ over the whole 64Gb. I need to move some processes to another system before I take "this" one offline for an extended period. In the mean time I'm running memtester 50G 10).
heynnema avatar
ru flag
`memtest` should be run offline, when booted to the `memtest` flash USB. What is `memtester`? Also show me `swapon -s`. Go to https://www.memtest86.com/ and download/run their free `memtest` to test your memory. Get at least one complete pass of all the 4/4 tests to confirm good memory. This may take many hours to complete.
liels avatar
ar flag
@heynnema. Yes, I understand that memtest86+ needs to be done from boot. I believe that it is included in the 20.04.2 image by default so that was my plan once I can take the system offline. ```` sudo swapon -s Filename Type Size Used Priority /dev/dm-1 partition 4194300 2665216 -2 ````
liels avatar
ar flag
@heynnema, memtest86+ 4 passes/0-error. Resynced. xfs_repair. Switched from hba to mobo SATA ports (+2 ports on a Syba / JM535 card). All drives pass smartctl -t. Wrote and read 136 GB /dev/zero and to /dev/null with sync. Locks seconds after writing at about 185MB/s on one scp. Another data point: another machine with 20.04.2 did the same thing writing to a RAID-0 with two nvme drives that had been stable before and after making the raid of the nvme drives. I'm starting to strongly suspect something wrong with the raid code and/or interaction with xfs. Might try Rocky or Alma next.
heynnema avatar
ru flag
Can you boot to a Ubuntu Live 21.04 and test your writing to disks again?
liels avatar
ar flag
@heynnema, it seems to work fine under 21.04 live. I pushed about a terrabyte onto the array at 700-800MB/s and no sign of trouble. There must be a problem with the raid or xfs code or something in 20.04.2. I suppose the least painful thing at this point is to upgrade to that version and wait for 22.04 LTS. The bug reporting happens with ubuntu-bug, in this case the kernel for 20.04.2, right?
heynnema avatar
ru flag
Good news! So you're going to update to 21.04, yes?
liels avatar
ar flag
@heynnema. Update to 20.10 happening now. Will move onto 21.04 next.
Score:0
es flag

So, I had the exact same problem as you.

11 disk software RAID6 setup via mdadm with an XFS partition. Disks attached via combo of mobo SATA and broadcom HBA SATA ports.

On Ubuntu 20.04.3 LTS I would get complete system freezes whenever I had high enough bandwidth writes over a short enough time period.

To rule out any other devices or network issues, I found writing a junk 1TB file to the array via dd if=/dev/zero of=testfile bs=1024 count=1024000000 status=progress to be the most reliable way to reproduce the issue.

The solution was to upgrade to Ubuntu 21.10. Ubuntu 21.04 took a little longer to freeze, but still froze. On Ubuntu 21.10 I could do my full 1TB test file 3 times without issues. Whatever bug was causing this is finally fixed.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.