Sunday, January 25, 2015

Backing up your data with Amazon S3 Glacier and rsnapshot. A complete guide, Part 3.


(Part I is here)

Remember when I told you it was a bad idea to backup everything automatically with rsnapshot if the files never change (e.g. photos from 5 years ago) ? Still, you want to make one backup of these files.
That's what we'll do now. We'll create archives manually and encrypt them.

Creating archives from the rsnapshot folders and encrypting them is left as an exercise. It should be easy if you read the guide.

Creating an archive

The most simple way to create an archive of a folder is this:
tar cf somename.tar folderPath/

Then you could run "watch du somename.tar -h" to see how it progresses.
Hopefully, there is a more complicated way!
tar cf - folderPath/ -P | pv -s $(du -sb folderPath/ | awk '{print $1}') | cat > somename.tar

This will display a nice progress bar. I suggest the use of screen for long-running jobs. You can also try Task Spooler, a barely known, badly documented yet very useful tool!
Note that the two commands above create an uncompressed archive. That's what you want if you are creating a backup of your music library, images, videos, ZIP files, ...

If you want compression:
tar zcf somename.tar.gz folderPath/
for a GZIP compressed file. It's slow because it uses one thread. There are multithreaded implementations of GZIP and BZIP2 (namely pigz and pbzip2) that will linearly accelerate the compression depending on the number of CPU cores.

With the progress bar:
tar cf - folderPath/ -P | pv -s $(du -sb folderPath/ | awk '{print $1}') | gzip > somename.tar.gz

Note: Instead of TAR you might want to take a look at DAR. It offers many useful features.
Note 2: I've found that with default options filesizes go like this: uncompressed > gzip > bzip2 > xz (~ 7zip).
Note 3: Here and in general: avoid ZIP, RAR and DMG files. Everything TAR based is 100 % open-source while these are not, or might get you in trouble. Also tar+gzip+bzip2 are available on all UN*X after the first boot.

Encryption

OpenSSL is able to do some compression but it's not meant for large files, so we can't use it. We are left with S/MIME and PGP. Here we will use GnuPG / GPG, an alternative to the original PGP (Privacy Guard / Pretty Good Privacy) proprietary software.

First we'll need to create a private and public key. I won't explain how PGP works, nor RSA or ElGamal... There are plenty of GUIs for all operating systems to create the keys, but you can create them with the command-line, as explained in this online guide:
gpg --gen-key

Make several copies your private and public keys. Use a strong password. Whenever possible use 400 permissions for the private key. You can give your public key to anyone ; in fact most people put them on a key server so other people can find the public keys to send encrypted e-mails.
PGP is great to encrypt files or e-mails so only designated recipients can read them. In this case, the entity encrypting the file and the recipient are the same person. Let's encrypt our archive:
gpg --encrypt --recipient 'Your Name or e-mail here' archivename.tar
or even shorter:
gpg -e -r NameOrEmail archivename.tar

You'll end up with a new file named "archivename.tar.gpg", encrypted. You can now delete the unencrypted version. 
Exercise: combine the archive creation, compression and encryption using pipes. Yes you can.

I would advise to check this page from NASA explaining how to use the AES256 cipher algorithm and how to use the compression flags of GPG.

2 comments:

  1. Why "Remember when I told you it was a bad idea to backup everything automatically with rsnapshot if the files never change"

    ReplyDelete
    Replies
    1. The backup archive is not incremental, but a complete copy. That makes it extremely easy to work with (just unzip the thing to get to your files) but requires a lot of space, hence the advice to avoid automatically backing up large files you know won't change.

      Delete