Gradual migration of IP address/port between servers

I'm a strong proponent of self-hosting all your services, if not on your own hardware than at least on dedicated rented hardware. For IT nerds of my generation, this has been the norm sicne the early 1990s: If you wante to run your own webserver/mailserver/... back then, the only way was to self-host.

So over the last 30 years, I've always been running a fleet of machines, some my own hardware colocated, and during the past ~18 years also some rented dedicated "root servers". They run a plethora of services for either my personal stuff (like this blog, or my personal email server), or any of the IT services of the open source projects I'm involved in (like osmocom) or the company I co-founded and run (sysmocom).

Every few years there's the need to migrate to new hardware. Either due to power consumption/efficiency, or to increase performance, or to simply avoid aging hardware that may be dying soon.

When upgrading from one [hosted] server to another [hosted] server, there's always the question of how to manage the migration with minimal interruption to services. For very simple services like http/https, it can usually be done entirely within DNS: You reduce the TTL of the records, bring up the service on the new server (with a new IP), make the change in the DNS and once the TTL of the DNS record is expired in all caches, everybody will access the new server/IP.

However, there are services where the IP address must be retained. SMTP is a prime example of that. Given how spam filtering works, you certainly will not want to give up your years if not decadeds of good reputation for your IP address. As a result, you will want to keep the IP address while doing the migration.

If it's a physical machine in colocation or your home, you can of course do that all rather easily under your control. You can synchronize the various steps from stopping the services on the old machine, rsync'ing over the spool files to the new, then migrate the IP over to the new machine.

However, if it's a rented "root" server at a company like Hetzner or KVH, then you do not have full control over when exactly the IP address will be migrated over to the new server.

Also, if there are many different services on that same physical machine, running on a variety of different IPv4/IPv6 addresess and ports, it may be difficult to migrate all of them at once. It would be much more practical, if individual services could be migrated step by step.

The poor man's approach would be to use port-forwarding / reverse-proxying. In this case, the client establishes a TCP connection to the old IP address on the old server, and a small port-forward proxy accepts that TCP connectin, creates a second TCP connection to the new server, and bridges those two together. This approach only works for the most simplistic of services (like web servers), where

  • there are only inbound connections from remote clients (as outbound connections from the new server would originate from the new IP, not the old one), and

  • where the source IP of the client doesn't matter. To the new server all connections' source IP addresses suddenly are masked and there's only one source IP (the old server) for all connections.

For more sophisticated serviecs (like e-mail/SMTP, again), this is not an option. The SMTP client IP address matters for whitelists/blacklists/relay rules, etc. And of course there are also plenty of outbound SMTP connections which need to originate from the old IP, not the new IP.

So in bed last night [don't ask], I was brainstorming if the goal of fully transparent migration of individual TCP+UDP/IP (IPv4+IPv6) services could be made between and old and new server. In theory it's rather simple, but in practice the IP stack is not really designed for this, and we're breaking a lot of the assumptions and principles of IP networking.

After some playing around earlier today, I was actually able to create a working setup!

It fulfills the followign goals / exhibits the following properties:

  • old and new server run concurrently for any amount of time

  • individual IP:port tuples can be migrated from old to new server, as services are migrated step by step

  • fully transparent to any remote peer: Old IP:port of server visible to client

  • fully transparent to the local service: Real client IP:port of client visible to server

  • no constraints on whether or not the old and new IPs are in the same subnet, link-layer, data centre, ...

  • use only stock features of the Linux kernel, no custom software, kernel patches, ...

  • no requirement for controlling a router in front of either old or new server

General Idea

The general idea is to receive and classify incoming packets on the old server, and then selectively tunnel some of them via a GRE tunnel from the old machine to the new machine, where they are decapsulated and passed to local processes on the new server. Any packets generated by the service on the new server (responses to clients or outbound connections to remote serveers) will take the opposite route: They will be encapsulated on the new server, passed through that GRE tunnel back to the old server, from where they will be sent off to the internet.

That sounds simple in theory, but it poses a number of challenges:

  • packets destined for a local IP address of the old server need to be re-routed/forwarded, not delivered to local sockets. This is easily done with fwmark, multiple routing tables and a rule, similar ro many other policy routing setups.

  • FIXME