Score:0

Tar and 7z compression on Linux - what's the difference?

tr flag

I have a problem! I have a backup script in python. It backups all the folders in /var/www/ into different .tar.7z for each folder inside /var/www/

The problem is that the compression time is very slow. And for 4GB big folders it stops compressing sometimes at 1G sometimes at 1,5GB.

This is the row for this compression:

os.system("tar cf - -C %s . 2>/dev/null 3>/dev/null | 7za a -p%s -si %s 1>/dev/null 2>/dev/null 3>/dev/null" % (cf, self.config.get(jn, "archpass"), filename))

When I try to tar -cf compress-dir.tar /var/www/bigsite.com/ the 4GB folder, it runs creates the .tar extremely quickly in few minutes it is ready.

However within the python script the temporary file that is created as soon as the .tar is starting to be generated, increases in size very slowly. After about 10 minutes it reaches about 1GB... and soon it stops increasing, not showing any error in the console.

Is there a way I can simulate the same that is happeing here: tar cf - -C %s . 2>/dev/null 3>/dev/null directly in bash? Because clearly it's not the same as tar -cf compress-dir.tar /var/www/bigsite.com/ as it runs much faster.

Maybe if I run the tar directly in bash an error may appear. Of course if you have any other ideas, please let me know.

in flag
Most tar versions support gzip compression built in by adding `z` option. Gzip is not as efficient as 7zip regarding the compressed size but may be little bit faster. For the 7zip problem I would suspect an old or defect version of 7zip as 7zip should not have problems with large files. You also don't have to stick with 7zip, other compressors like bzip2 can also be used. And using certain parameters you can reduce the compression efficiency to speed up the backup process.
Score:1
ar flag

Tar does not do compression by itself. That means the throughput is almost purely limited by IO capability. Hard drives can easily handle 100MB/s, so 4GB R/W should be doable in 80 seconds or so - 4GB read and 4GB written. SSD's can do hundreds of megabytes per second or more.

However, 7z does compression. Compression is basically CPU-bound, not storage-bound.

To benchmark 7z, you can run 7z b. On the laptop I'm using to write this, 7z can handle 20MB/s. My NVMe storage can handle 2GB/s. That's two orders of magnitude in difference! Compressing 4GB would take 200 seconds; simply stuffing it into a tarball should take 2 seconds!

Various compression algorithms have different tradeoffs. 7z can be configured to use differing levels of compression, in 10 steps.

You should experiment with different levels to find the correct speed-size tradeoff for your application.

Is there a way I can simulate the same that is happeing here: tar cf - -C %s . 2>/dev/null 3>/dev/null directly in bash? Because clearly it's not the same as tar -cf compress-dir.tar /var/www/bigsite.com/ as it runs much faster.

Sure, it's a shell command. os.system() simply runs the commands specified. The tar command in your example adds %s to a tarball and outputs the result to stdout. In the full example you provide that's piped to 7z.

But as I said above: it's not tar that's slow. It's 7z.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.