Sunday, February 8, 2015

Parallel Gzip with a progress bar

Everybody knows how to create a "tar.gz", probably using something like this:

tar zcvf foo.tar.gz bar-directory/

This is fine to compress something relatively small (a few megs).
However, if you have to archive hundreds of gigabytes, you probably want features such as:

  • Speeding things up
  • Displaying a progress bar
  • Displaying an ETA
This command* offers all of these:

tar cf - bar-directory/ -P | pv -s $(du -sb bar-directory/ | awk '{print $1}') | pigz > foo.tar.gz

You'll probably have to install pigz, which is a parallelized version of Gzip. You can substitute it with pbzip2 if you want a "tar.bz2" archive.

Supposing the CPU was the bottleneck and not the I/O, which is probably the case if you are working on the local filesystem, this will speed things up by a factor of "number of CPU cores". For instance, instead of 1 hour with gzip, it took only 15 minutes with pigz on my machine with 4 CPU cores.
In fact it was almost as fast as copying the directory without archiving / compression.

As a bonus you get a nice progress bar in your terminal:


*On OS X and *BSD, du uses another unit, use awk to convert the size into bytes.

No comments:

Post a Comment