Designing Redis replication partial resync

antirez 4871 days ago. 248860 views.

In this busy days I had the idea to focus on a single, non-huge, self contained project for some time, that could allow me to work focused as much as hours as possible, and could provide a significant advantage to the Redis community.

It turns out, the best bet was partial replication resync. An always wanted feature that consists in the ability to a slave to resynchronize to a master without the need of a full resync (and RDB dump creation on the master side) if the replication link was interrupted for a short time, because of a timeout, or a network issue, or similar transient issue.

The design is different compared to the one in the Github feature request that I filed myself some time ago, now we have the REPLCONF command that is just perfect to exchange replication information between master and slave.

Btw these are the main ideas of the design I'm refining and implementing.

1) The master has a global (for all the slaves) backlog of user configurable size. For instance you can allocate 100 MB of memory for this. As slaves are served in the replicationFeedSlaves() function this backlog is updated to contain the latest 100 MB of data (or whatever is the configured size of the backlog).
2) There is also a global counter, that simply starts at zero when a Redis instance is started, and is updated inside replicationFeedSlaves(), that is a global replication offset. This offset can identify a particular part of the outgoing stream from the master to the slaves at any given time.
3) If we got no slaves at all for too much time, the backlog buffer is destroyed and not updated at all, so we don't pay any performance/memory penalty if there are no slaves. Of course the buffer also starts unused when a new instance is started and is initialized only when we get the first slave.

Ok now, when a slave connects to a master, it uses the command:

REPLCONF get-stream-info

It gets two informations this way:

1) The master run id.
2) The master replication offset of the first byte this slave is going to receive in the replication stream.

The slave will make sure to update this offset as it receives data, so every slave knows the offset of the data it is consuming, from the point of view of the master global offset.

The master backlog is implemented using a circular buffer, so no memory move or reallocation operations are needed, it's just a copy of bytes.

Ok, this is the setup.

Now what happens after a short disconnection?

1) The slave gets disconnected, it needs to reconnect to the master, but the client structure of the latest master connection is not discarded when a disconnection happens. It is saved to see if it's possible to reuse it later.
2) The slave reconnects and uses REPLCONF get-stream-info to get the master id and replication offset.
3) If the master run id matches, we can try a partial resync.
4) We use REPLCONF set-partial-resync-offset <offset>, Where <offset> is the offset of the latest byte we received with the previous replication link, plus one.
5) Now the master can reply with an error if there is not enough backlog to to start the replication with the specified offset, or can reply +OK if it's possible.
6) If it is OK a partial resync is initiated and the master will simply provide the slave with all the backlog the slave asked for, plus all the new updates.
7) The slave on the other side will reuse the saved master client structure, and will simply update the socket. The replication state is also marked as CONNECTED.

Everything fine!

Ok it's not *that* trivial but you got the idea, we have a backlog, and every slave has information about what it's consuming.

This is going to enter Redis 2.8 of course.