Self-Hosted E-Mail Service with Distributed QoS: A Proposal

Codenamed project Agamid
17 Janvier 2015

Cet article est un brouillon.

This article is a draft.

This documents discusses a possible way to implement secure e-mail self-hosting using a network of individual servers, each able to work as a backup for every others. Requiring a low trust in other machines in the network, and relying heavily on asymetric cryptography, it may be a solution to increase robustness and security of self-hosted e-mail.
{:.lead}

Introduction

Self-hosting is a fundamental principle of the internet: the network is acentric, and every connected computer is potentially a client and a server. But despite hosting one’s own services being widespread amongst geeks, mail service seems very often to be left to third-party providers.

  • To {:toc}

I can see some reasons for this situation:

  1. The relative complexity of setting up a mail server (compared to HTTP, FTP, or even XMPP services).

  2. The need for at least a decent uptime. One’s personal website can be down for a few days, or can be mirrored to a major hosting provider in case of problem; e-mail lacks of a good approach to backup services. If your server falls down, you can have a secondary MX set to Microsoft, Google or whoever it is who can handle e-mail for a custom domain, but this is not an ideal solution. This uptime problem is aggravated by the usually poor quality of domestic network and electric infrastructure: if you leave your home for a few days, your modem may crash, your router may freeze, your cat or flatmate may accidentally unplug your server

  3. Google has the best webmail software, but you can’t host it yourself. Everything else sucks.

The first point is addressed by point-and-click self-hosting solutions such as Yunohost, and MailPile seems to be on its way to solving the webmail problem.

This article will sketch the design of a solution to the second issue (uptime) based on low-cost hosting on Raspberry Pis, low-end PCs or even professional-grade servers, with multiple hosts on different locations sharing the same basic setup, each serving its own mailbox(e)s, but each able to act as backup for every other node. I will give a general discussion of the project, laying out the general protocol, discuss the security issues it addresses, and the specific problems it may introduce. Then I’ll try to briefly sketch a possible implementation of this solution, based on common free-software tools and some scripting.

I’m not aiming at designing anything new or revolutionary, the solution should be as simple and robust as possible.

Notes on vocabulary

The words “should”, “may” and “must” are used accordingly to their definition in RFC 2119.

The word “backup” means an alternative server for some use, unless when used with the “data” prefix, where it designates a solution to insure the persistence of data in storage by any means, including their replication in local or remote storage hardware and/or services.

General presentation

The network

We’re to build an interconnected network of mail servers (nodes), each serving as a backup for every other. This network has the following properties:

  1. Each node is associated to at least one (usually physical, but not necessarily) person, at least one mail domain.

  2. No node can be expected to be reachable at any moment, to become reachable again if disconnected.

  3. Every person in the network trusts their own node for the storage of their personal information. That is, every person in the network setups and maintains their own data backup solution if needed. The network isn’t responsible for individual backups.

  4. The need to give trust to any given node with another one’s data should be as low as possible (this follows point 2)

This boils down to: The need to have trust in anybody besides yourself should be as low as possible, but some may be needed. Who you trust is your problem, see below.

Security and integrity

Such a network is only made of weak links — a node may get disconnected forever, with no preliminary announcement. Not all nodes can be considered safe, and which nodes are safe can be impossible to determine. A participant in the network, despite this network being based on personal affinity, may become rogue (unlikely) or introduce security problems by mistake (very likely). We must then devise a solution to avoid mail loss and interception as much as possible.

  1. Every node is the only primary MX for its domain. If the node is reachable, no other computer on the network will see its mail.

  2. If the main MX for a given domain is unreachable, which of the other nodes will finally receive its mail can’t be predicted. If this host receives lots of e-mail, it is very likely that every node will receive some of them. This will lead to a problem of potential loss which will be discussed in the next subsection.

  3. No user participating to the network should be force to use all and every other nodes as secondary MXes. Each user may choose to pick only a subset of the available nodes (whitelist), or to exclude some nodes ( blacklist), and get a matching DNS record. The DNS server controlling the zone should be able to assist in automatic generation of these nodes.

  4. When a node receives mail for another (ie, acts as a backup MX), it immediately encrypts the full mail (including headers) using this client’s public key.

This network should work according to two main principles:

  1. Every node is fully responsible for the storage, conservation and backup of its own e-mails. Every user is expected to trust their own setup for the storage of their data. Data backup is an individual problem.

  2. The full network, or at least a sufficient subset, should take care of the mail received as backup.

Point 2 implies that whenever a node receives mail for another recipient than its main one, it MUST attempt to copy it to other nodes of the network.

An example use case

Network setup

Alice, Bob, Carol and Dave have setup their network of mail server. Each have a Raspberry Pi next to his or her modem, each with its own domain name and serving its own e-mail. They also share a DNS zone for their network, mesh.example.com.

Person Node Domain
Alice alice.mesh.example.com alice.com
Bob bob.mesh.example.com bob.com
Carol carol.mesh.example.com carol.com
Dave dave.mesh.example.com dave.com

Every personal domain’s MX record looks like:

alice.com.    MX   1 alice.mailnet.example.com
alice.com.    MX   100 all.mesh.example.com

alice.mailnet.example.com is a CNAME to alice.com or whatever she wants (so her DNS record could of course just have been: alice.com MX 10 alice.com)

The DNS for all.mesh.example.com looks like:

mailnet.example.com. IN A [alice's ip]
# Bob doesn't have a static IP
mailnet.example.com. IN CNAME [bob's dyndns domain]
mailnet.example.com. IN A [carol's ip]
mailnet.example.com. IN A [dave's ip]

Working behavior

If someone sends an e-mail to Alice, her own server receives the mail, and do whatever it wants with it.

But Bob’s modem went down not to reboot, and Bob’s on holidays. His mail gets delivered, at random, to any other server, say Carol’s. Carol’s server is aware it should receive mails for bob.com. It accepts the delivery, then:

  1. Encrypts the mail with Bob’s public key.

  2. Randomly chooses a few (number will depend on configuration) other nodes, then:

  3. Transfer Bob’s encrypted mail to these servers (using uucp over ssh or a similar technology), if they can be reached and accept to act as a temporary storage (see below about accepting or rejecting forwarding).

  4. Every server holding a piece of Bob’s mail will regularily try to send it back to Bob, using uucp over ssh.

Specification Draft

General specification goals

  1. Be nice with low-end hardware.

  2. Cryptography matters; plain text communication is evil.

Special features

  • Mailbox are stored using common unix names, but when receiving mail, they’re identified using regular expressions. This allows for easy use of aliases and various customizations (eg, gmail like tags: mbname+tag@example.com).

Node manifest format

YAML:

{% highlight yaml %} node:
host: “carol.com”
mailboxes: contact: #personal # To immediately reject spam sent on obsolete addresses,
# carol updates her node’s manifest every month
# with a nice little script. matches: ‘^(talktome|contact(0613|0713))(\+\w+)?’ {% endhighlight %}

Roadmap to implementation

General implementation goals

  1. The spec is not the implementation. There must be a clear specification, and the reference implementation will follow it.

  2. Leverage as much as possible on existing, standard technology.

  3. Use free, open-source software. Release on a free, open-source license.

  4. Don’t try to come with a perfect solution for everything. Users know what they want for themselves. Regular expressions for mailboxes names are cool, catchall are alloweds (see Special Features, below).

  5. Have it work out of the box with IMAP and POP3 services. Easy configuration is cool.

<img src="/media/agamid/AgamaSinaita.jpeg" />
<figcaption>
    <a href="https://en.wikipedia.org/wiki/File:AgamaSinaita01_ST_10_edit.jpg">Agama Sinaita by Ester Inbar  from Wikimedia Commons</a>.
</figcaption>

This draft is released under the terms of the Gnu Free Documentation License, version 1.3.