# Hashsync: Fast FTP synchronization for static website generators

15 Décembre 2013

Hashsync syncs a directory over FTP using a hash database instead of timestamps to push only modified files. Most static CMS rebuild the complete site at each modification; this script provides a way to push only the files that have been modified. {: .lead}

## Installation

You need a Python interpreter (2.7 required, 3.x recommended if you need unicode filenames) and OpenSSL (or any other hash command, see Usage below).

curl https://gist.github.com/thblt/7975807/raw/53cb65f2fd72ae4719869423f2c493adfd2c43a4/hashsync.py > /usr/local/bin/hashsync

To run on Python 2, change the first line to #!/usr/bin/env python. Python 3.x is required if the files you want to synchronize have non-ASCII characters in their names.

## Usage

The basic usage is very straightforward:

hashsync /home/public_www/my_site ftp://login:password@ftp.example.com/www/

Some optional arguments are available, the most important being -d (--delete) which makes deletes on the remote location files which doesn’t exist locally.

Invoke with -h or --help to view all available options.

## Technical notes

### Limitations

All these limitations may be removed in future versions depending on needs and contributions.

1. Bug: HashSync doesn’t do create missing paths on remote server, assumes the whole folder structure is already present, and crashes if not.

2. Optimization: Calculating hashes may become extremely slow on large files. In a future version, Hashsync may allow to declare that some files never gets modified, but can only be added or deleted, thus removing the need to compute hashes for them. This may apply for, eg, large video or pictures files which are quite unlikely to be modified.

3. Optimization: Hashes are currently computed on a single thread.

4. Optimization: Cryptographically secure hashes are generally slow to calculate. The use of unsafe algorithms should be considered.

5. Protocols: Neither FTPS nor SFTP are currently supported.

6. Usage: Hashsync has no --exclude option (it syncs the whole directory, without any exceptions).

7. Compatibility: Hashsync depends on OpenSSL (or another command-line tool, see -H, --hash) being installed. This isn’t a problem on most Unixes, but may not be so easy on Windows machines. A later version may fallback to Python hashlib if OpenSSL isn’t available.

8. Compatibility: If invoked from multiple machines, Hashsync must be run with the exact same parameters to work properly. In the future, the configuration (hash algorithm, exclusions, and so on) may be stored on the remote server.

9. Hashsync is uni-directional (by design): it only syncs from local to remote.

### Cache file format

Super easy:

{relative path to file 1}\t{hash of file 1}\n
{relative path to file 2}\t{hash of file 2}\n
…
{relative path to file N}\t{hash of file N}\n

Example:

.htaccess	21f731da56d2f3a91a857fa52cbe8b16ad30ec8f
404.html	45ce8f0e7901fcb70209313e7a84bc31103073b6
blog/2013/qmail-raspberry-pi.html	5a12796c26ed92f3043d3b63752e9efe4747ef73