Partial resyncs and synchronous replication.

antirez 4596 days ago. 255824 views.

Currently I'm working on Redis partial resynchronization of slaves as I wrote in previous blog posts.

The idea is that we have a backlog of the replication stream, up to the specified amount of bytes (this will be in the order of a few megabytes by default).

If a slave lost the connection, it connects again, see if the master RUNID is the same, and asks to continue from a given offset. If this is possible, we continue, nothing is lost, and a full resynchronization is not needed. Otherwise if the offset is about data we no longer have in the backlog, we full resync.

Now what's interesting about this is that, in order to make this possible, both the slave and the master know about a global offset that is the replication offset, since the master was ever started.

Now, if we provide a command that returns this offset, it is simple for a client to simulate synchronous replication in Redis just sending the query, asking for the offset (think about MULTI/EXEC to do that) and then asking the same to the slave. Because Redis replication is very low latency, the client can simply do an optimistic "write, read-offset, read-offset-on-slave" and likely the offset we read on the slave will already be ok to continue (or, we can read it again with some pause).

This is already something that could be useful, but I wonder if we could build something even better starting from that, that is, a way to send Redis a command that blocks as long as the current replication offset was not acknowledged from at least N connected slaves, and returns when this happened with +OK.

I'm not promising that this will be available as we need to understand how useful is this and the complexity, but from an initial analysis this could be trivial to implement fast and reliably... and sounds pretty good.

More news ASAP.