Sunday, February 8, 2015

Ditch scp (SSH copy) and use rsync instead

That is a strong statement, I know.
You have been living happily and using scp to transfer files between machines is a no brainer. But here is the thing: HTTP, FTP, SMTP, and SSH are bad file transfer protocols. Why you ask? Because they offer no guarantee you have got yourself a carbon copy.

The solution? Checksums! And not MD5 or SHA1 which are both broken, but SHA256 or SHA512, both already available on your system (except for Windows...)

So you could sha256sum the file before transfer, transfer, then sha256sum the file after the transfer. But that's rather painful and you'd have to write a complicated script to automate this. There's a much simpler solution.

For some reason, people think rsync is a complicated tool. Maybe because it's got so many flags and options. But it's actually a very complete tool that does what you want without having to write complicated shell scripts.
Four amazing things it can do:

  • Syncing directories (great for backup), transferring only what needs to be transferred, deleting things that aren't present on one side if that's what you want
  • Resuming directory transfers where it left, copying only what's needed
  • On-the-fly compression
  • Checksums
The last feature is a very interesting one, because you will know immediately if the transfer failed, whereas scp won't even tell you anything and might act like everything was fine. It's only 6 months later when you try to restore a backup that you realize all your data is lost forever...

So, replace this:

scp myfile.tar.gz remote-user@remote-host:/remote/directory/

with this:

rsync -avPe ssh myfile.tar.gz remote-user@remote-host:/remote/directory/

Of course you can transfer files the other way around. There is no additional flag needed for directories, but you have to understand the importance of trailing slashes. They play an important role.
Here I used the SSH protocol to perform the transfer but there are other options, including the rsync protocol, that is commonly used to mirror distribution repositories accross the world, amongst other things.

No comments:

Post a Comment