Hashsync: Fast FTP synchronization for static website generators

15 Décembre 2013

Hashsync syncs a directory over FTP using a hash database instead of timestamps to push only modified files. Most static CMS rebuild the complete site at each modification; this script provides a way to push only the files that have been modified. {: .lead}

Installation

You need a Python interpreter (2.7 required, 3.x recommended if you need unicode filenames) and OpenSSL (or any other hash command, see Usage below).

curl https://gist.github.com/thblt/7975807/raw/53cb65f2fd72ae4719869423f2c493adfd2c43a4/hashsync.py > /usr/local/bin/hashsync

To run on Python 2, change the first line to #!/usr/bin/env python. Python 3.x is required if the files you want to synchronize have non-ASCII characters in their names.

Usage

The basic usage is very straightforward:

hashsync /home/public_www/my_site ftp://login:password@ftp.example.com/www/

Some optional arguments are available, the most important being -d (--delete) which makes deletes on the remote location files which doesn’t exist locally.

Invoke with -h or --help to view all available options.

Technical notes

Limitations

All these limitations may be removed in future versions depending on needs and contributions.

  1. Bug: HashSync doesn’t do create missing paths on remote server, assumes the whole folder structure is already present, and crashes if not.

  2. Optimization: Calculating hashes may become extremely slow on large files. In a future version, Hashsync may allow to declare that some files never gets modified, but can only be added or deleted, thus removing the need to compute hashes for them. This may apply for, eg, large video or pictures files which are quite unlikely to be modified.

  3. Optimization: Hashes are currently computed on a single thread.

  4. Optimization: Cryptographically secure hashes are generally slow to calculate. The use of unsafe algorithms should be considered.

  5. Protocols: Neither FTPS nor SFTP are currently supported.

  6. Usage: Hashsync has no --exclude option (it syncs the whole directory, without any exceptions).

  7. Compatibility: Hashsync depends on OpenSSL (or another command-line tool, see -H, --hash) being installed. This isn’t a problem on most Unixes, but may not be so easy on Windows machines. A later version may fallback to Python hashlib if OpenSSL isn’t available.

  8. Compatibility: If invoked from multiple machines, Hashsync must be run with the exact same parameters to work properly. In the future, the configuration (hash algorithm, exclusions, and so on) may be stored on the remote server.

  9. Hashsync is uni-directional (by design): it only syncs from local to remote.

Cache file format

Super easy:

{relative path to file 1}\t{hash of file 1}\n
{relative path to file 2}\t{hash of file 2}\n
…
{relative path to file N}\t{hash of file N}\n

Example:

.htaccess	21f731da56d2f3a91a857fa52cbe8b16ad30ec8f
404.html	45ce8f0e7901fcb70209313e7a84bc31103073b6
about/index.html	48bb2305e7f999419fe4f46c1de6a2866fd7b845
affiches/2013-12-genre-psy.pdf	9829a0ad8a7354c3d440c244128a199c35d83814
atom.xml	51664974e2155c9673f664e0c0d9ed8a53297e18
blog/2010/courants-dair.html	52e725bf74e94a773013bad3c54d6e835138ca83
blog/2013/becher.html	24c90cb101a46e3d48c1aa3c78d44a2359fe8723
blog/2013/corrige-qcm.html	dafddb17ef9c0fee71dba00a38352c93b5847633
blog/2013/dont-ask-me-again.html	d68c1051de29acac971edc74f4017a667c8f62ef
blog/2013/fast-ftp-sync-for-jekyll.html	590c8d072fec06e12ec5512cdaeb20f6cad4afb0
blog/2013/pyrate.html	d23ee11b35c7e8da63a97e72073ff3b6dc727e46
blog/2013/qmail-raspberry-pi.html	5a12796c26ed92f3043d3b63752e9efe4747ef73

Source code

{% gist 7975807 %}