So, We have a supermicro server with the next hardware configuration:
CentOS 7
Softraid (because this supermicro configuration didn't support hardware raid..)
/ Partition is RAID 10 and the rest one is RAID 1
CPU: 2x AMD EPYC 7402
RAM: 512Gb DDR4 (16x 32Gb)
10x 2TB Intel SSD DC P4510 NVMe.
This server is a shared hosting one with CloudLinux, cPanel, etc.
Every 2 days, we receive the following error in the console:
Oct 18 23:11:19 toranaga kernel: XFS (snumbd4d): log I/O error -5
Oct 18 23:11:19 toranaga kernel: XFS (snumbd4d): Log I/O Error Detected. Shutting down filesystem
Oct 18 23:11:20 toranaga kernel: XFS (snumbd3d): log I/O error -5
Oct 18 23:11:20 toranaga kernel: XFS (snumbd3d): Log I/O Error Detected. Shutting down filesystem
Oct 18 23:11:20 toranaga kernel: XFS (snumbd1d): log I/O error -5
Oct 18 23:11:20 toranaga kernel: XFS (snumbd1d): Log I/O Error Detected. Shutting down filesystem
Oct 20 16:01:54 toranaga kernel: XFS (snumbd8d): log I/O error -5
Oct 20 16:01:54 toranaga kernel: XFS (snumbd8d): Log I/O Error Detected. Shutting down filesystem
Oct 20 16:01:54 toranaga kernel: XFS (snumbd2d): log I/O error -5
Oct 20 16:01:54 toranaga kernel: XFS (snumbd2d): Log I/O Error Detected. Shutting down filesystem
Oct 20 16:02:02 toranaga kernel: XFS (snumbd6d): metadata I/O error in "xfs_read_agf+0x8e/0x120 [xfs]" at daddr 0x423e1d801 len 1 error 5
Oct 20 16:02:02 toranaga kernel: XFS (snumbd6d): log I/O error -5
Oct 20 16:02:02 toranaga kernel: XFS (snumbd6d): Log I/O Error Detected. Shutting down filesystem
Oct 20 16:02:05 toranaga kernel: XFS (snumbd7d): log I/O error -5
Oct 20 16:02:05 toranaga kernel: XFS (snumbd7d): Log I/O Error Detected. Shutting down filesystem
Some advice what should we do?
Thanks!