Tar extracting from multi-volume tape whilst computing shasums

Gary

11/15/22, 11:32 AM

As part of our backup system, we replicate zfs datasets from a TrueNAS system to a couple of backup servers, one of which is running TrueNAS Scale and has a LTO-5 tape drive connected. We occasionally write one of the read-only snapshot's contents to tape. As some of these datasets are large, tar is used with the --multi-volume flag.

Prior to backup, sha256sums are generated for every file in the snapshot directory. A copy of this file is kept on the server and also written to tape.

After this, the entire contents of the snapshot are written to tape using

  tar --acls --xattrs --spares --label="SomeLabel" --multi-volume -cvpf /dev/nst0 *

This has served us well, however, I wish to verify the data after it has been written to tape. I want to avoid needing to extract the entire dataset of files to a scratch location which would otherwise allow running "sha256sum -c" as the TrueNAS scale server does not have sufficient additional space for some datasets to be extracted. Instead I tried:-

  tar --multi-volume -xf /dev/nst0 --to-command=tar-shasums.sh | tee verify-datasetname.sha25sum

Where tar-shasums.sh is along these lines:

#!/bin/bash

sha1=`sha1sum`
echo -n $sha1 | sed 's/ .*$//'
echo "  $TAR_FILENAME"

I've run into an issue however if the tar spans across two tapes. When tar is in the middle of reading back a file that spans two tapes, it will ask for the next volume to be inserted and enter to be pressed. However, this will error as the device is in use.

It looks like the "--to-command" is still active for that file, since it has yet to receive all the data to produce the shasum, yet it also cannot finish until the tape is changed, but the tape cannot be changed until it has finished...

Currently I kill the shasum process, which allows tar to continue with the next tape but means that one file spanning the two volumes cannot be verified. Unless that file is manually extracted and checked. Not ideal.

I'm expecting a no, but, is there any way around this? Any way to generate shasums that does not involve extracting the entire tar to disk first? Or, any way to break the locks on /dev/nst0 to allow tar to continue reading from the newly inserted tape without having to kill shas256sum?

1 + 2

linux

backup

tape

tar

Gerard H. Pille

11/15/22, 5:53 PM

What if the tar extract would write to a named pipe, and she sha1sum reading from that pipe?

Gary

11/16/22, 9:39 AM

I had a look at the tar source last night and it looks like "--to-command" does create a pipe it then uses fork to run script and pipes the file data to it. That fork causes all the parent i.e tar's file descriptors to be passed onto the script, which includes /dev/nst0 and not just the pipe the script is reading data from. Keep in mind, the reason for using --to-command is that it executes per file extracted from the tar, so you can generate checksums for each file rather than the tar archive as a whole.

Score:1

Server

Gary

11/16/22, 9:43 AM

I had a look at the tar source last night and it looks like "--to-command" does create a pipe it then uses fork to run script and pipes the file data to it.

So the issue is, fork causes the forked process to inherit all the parents file descriptors which includes the /dev/nst0 device that tar has open. Tar then closes /dev/nst0 ready for media change but the forked process that is waiting for more piped data still has it open, hence deadlock.

I've partially worked around this by changing the script it runs to always close /dev/nst0 descriptor

DEVICE=/dev/nst0
file=`lsof -p $$ | grep ${DEVICE} | awk '{print $4}'`
file=${file::-1}
eval "exec ${file}<&-"

There is then just one process "sh" that appears to still hang on to the file descriptor. "fuser -u /dev/nst0" shows this and as a temporary workaround it's possible to use gdb to close it after which the media change and remainder of the checksums generate correctly.

gdb -p PID
p close(FD)

I'm not sure if it's possible to use fork but not pass all file descriptors to the forked process but that looks like it would be the final solution.

I'll update this answer if I figure that out.

0 + 0

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Tar extracting from multi-volume tape whilst computing shasums

TH: การสกัดน้ำมันดินจากเทปหลายวอลุ่มขณะคำนวณ shasum

RO: Extragerea gudronului din bandă cu mai multe volume în timp ce se calculează shasum

RU: Извлечение смолы из многотомной ленты при вычислении шасумов

VI: Trích xuất Tar từ băng nhiều tập trong khi tính toán shasums

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.