Sunday, January 25, 2015

Backing up your data with Amazon S3 Glacier and rsnapshot. A complete guide, Part 4.

Aletsch Glacier in Switzerland
Photo from John Fowler

Amazon Glacier

Here we go! In Part I I wrote quite a lot about Glacier, now is the time to get our hands dirty.


Create an AWS account and a vault

Before starting you need an Amazon Web Services account. I realize there is no point in showing you screenshots or explaining you the following process in details because it's well documented by Amazon and it will be probably be outdated by the time you read this. Using the web management console:

  1. Create an AWS account. It will be referred by Amazon as the "root account".
  2. Create a group with permissions for Amazon Glacier.
  3. Create a user (IMA) and add it to the group you just created. Mark down the access key and the secret key. We will need them in a minute.
  4. Switch to the datacenter you want your data to be stored in.
  5. Create a new vault in Glacier. Name it appropriately, for instance "yourHostname_backup".
  6. Configure the appropriate limit for the vault in the "Settings" panel from the Glacier page.

Glacier-Cmd

Amazon doesn't develop any command-line or graphical client. All they offer are wrappers in all languages for their REST API. The Java and .NET APIs offer high-level features that the others do not. But still, everybody needs to upload and download archives, so some people developed interfaces. One of them is Glacier-Cmd.

As a regular user:

git clone https://github.com/uskudnik/amazon-glacier-cmd-interface.git
cd amazon-glacier-cmd-interface
sudo python setup.py install

At the time of writing there was a pending patch to support "eu-central-1", Amazon's latest datacenter located in Frankfurt am Main, Germany, Central Europe.

With the user that will send the archives to Amazon Glacier:

Edit ~/.glacier-cmd:

[aws]
access_key=YOUR_ACCESS_KEY
secret_key=YOUR_SECRET_KEY

[glacier]
region=eu-central-1
#logfile=~/glacier-cmd.log
#loglevel=INFO
#output=print

Change the keys and region accordingly. The rest is optional. Attention! You must choose the region in the AWS web console, not here!
The logging commands don't seem to work.

Verify this is working:

glacier-cmd lsvault

By the way, you can create vaults with glacier-cmd.
The command-line is badly documented. Look at this page instead.

To upload an archive:
glacier-cmd -c ~/.glacier-cmd  upload --description "A description of the archive" your_vault_name archiveName.tar

Do I need to tell you to run this with Task Spooler or Screen?

I am not sure Glacier Cmd completely supports resuming. But in case you get a timeout, try this:
  1. glacier-cmd listmultiparts your_vault_name
  2. Copy the upload ID
  3. Retry the upload with "--resume --uploadid the_copied_upload_id".
The CLI says it resumes something even though the documentation says it doesn't support the feature, so I'm a bit lost. Maybe because the doc is 2 years old...
See also this solution in case of timeouts. (in short: while true: do glacier-cmd ... --resume --partid 1234 --partsize multiple_of_two; sleep 600; done)

Alternative: Glacier CLI. It also offers an interesting mechanism to work with git-annex.
Alternative on Windows and Mac: I love CyberDuck. It might well be the most user-friendly GUI for many storage providers including S3 Glacier.


Things to know

  • Upload is free
  • Download is not
  • Asking for the "inventory" takes times.
  • Asking for a download link takes at least 4 hours.
  • Download links are available for 24 hours. (Does it mean we have 24 hours to download the entire archive?)
  • It takes one day before deleted archives are not listed anymore in the "inventory".
  • Started but failed uploads are kept by Amazon, you must either resume them (see above) or remove them (glacier-cmd abortmultipart)

Automating

The next step is to automate the upload and the deletion of the archives created from rsnapshot. 
Remember you can only delete for free 90 days after the archive has been uploaded.

Last words: Testing

VoilĂ . We are done. One last advice: test your backups. To do this, create a small directory, for instance with a single "Hello World" text file, and modify rsnapshot so it runs a script that will create an archive, encrypt it and send it to S3 Glacier. Then download the archive with Glacier-cmd or another software (try one of the GUIs like SAGU), decrypt it, extract it and see if you can retrieve the "Hello, World" text.

No comments:

Post a Comment