Score:3

HHDs are unmounted on transferring a large number of files

bd flag

I am transferring a large number of files (10-100 million) between two external HDDs (connected via USB 3.0), but the external disks are unmounted during the transfer. All HDDs connected via USB (including those on a different port/dock) will be unmounted, not just the HDDs in use.

This problem never happens when transferring a smaller batch of files (e.g., 4 X 5 hours, instead of copying for 20 hours). The problem only occurs when copying takes longer than 10 hours.

There is nothing wrong with the HDDs or files. I frequently checked the disks, and this problem has happened for more than ten different HDDs (all EXOS Enterprise 18TB).

I tried GUI cut/paste, cp, rsync.

How can I diagnose the problem to resolve it? Any hit would be sufficient; I do not expect a full solution.

Hardware: Taichi X399 Motherboard with Threadripper 1950X CPU. The filesystem is ext4.

There are, of course, many I/O errors in syslog, but I believe they are a result of the unmounted device rather than the cause. The first pertinent errors are:

Jan 30 12:54:07 ubuntu1 kernel: [22033.966192] xhci_hcd 0000:0c:00.3: xHCI host not responding to stop endpoint command.
Jan 30 12:54:07 ubuntu1 kernel: [22033.966203] xhci_hcd 0000:0c:00.3: USBSTS: 0x00000000
Jan 30 12:54:07 ubuntu1 kernel: [22033.983466] xhci_hcd 0000:0c:00.3: xHCI host controller not responding, assume dead
Jan 30 12:54:07 ubuntu1 kernel: [22033.983493] xhci_hcd 0000:0c:00.3: HC died; cleaning up
Jan 30 12:54:07 ubuntu1 kernel: [22033.983581] usb 4-1: USB disconnect, device number 2
cn flag
This is likely hardware related. You need to do some checks: does it happen when you change ports, when you do this on another machine. And do this using another USB cable.
Doug Smythies avatar
gn flag
what is the file format of the disks? Does the unmount occur at the same point of the file copying process each time or different spots?
bd flag
@Rinzwind I tried everything except another machine. If it is hardware, it should be the motherboard USB controller, not the disks.
bd flag
@DougSmythies no, it is not reproducible. When it is unmounted, if I restart the system, I can continue copying. The problem occurs when the copying takes a long time (e.g., 10-20 hours). It looks like overheating the USB controller or something. Instead of copying for 20 hours, if I copy the files in 4 X 5 hours, everything would be fine.
Soren A avatar
mx flag
Are there anything in the log-files in /var/log, specially dmesg and syslog, auround the time of the chrash ?
bd flag
@SorenA I added the error log to the question. I should have done it before.
cn flag
"USBSTS: 0x00000000" is the key part. Other errors messages all show a text with an error code there (if it shows error -110 that is a disconnect). You might have found a problem that is not handled correctly so I would suggest to file a bugreport against the usb xhci linux driver.
I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.