I’m back home, after a non easy trip, since to travel from San Francisco to Sicily is kinda NP complete: there are no solutions involving less than three flights. However it was definitely worth it, because the Redis Conference 2015 was very good, SF was wonderful as usually and I was able to meet with many interesting people. Here I’ll limit myself to writing a short account of the conference, but the trip was also an incredible experience because I discovered old and new friends, that are not just smart programmers, but also people I could imagine being my friends here in Sicily. I never felt alone while I was 10k kilometers away from my home.
News posted by antirez
Today Redis is six years old. This is an incredible accomplishment for me, because in the past I switched to the next thing much faster. There are things that lasted six years in my past, but not like Redis, where after so much time, I still focus most of my everyday energies into. How did I stopped doing new things to focus into an unique effort, drastically monopolizing my professional life? It was a too big sacrifice to do, for an human being with a limited life span. Fortunately I simply never did this, I never stopped doing new things.
Redis speed could be one selling point for new users, so following the trend of comparative “advertising” it should be logical to have a few comparisons at Redis.io. However there are two problems with this. One is of goals: I don’t want to convince developers to adopt Redis, we just do our best in order to provide a suitable product, and we are happy if people can get work done with it, that’s where my marketing wishes end. There is more: it is almost always impossible to compare different systems in a fair way.
Today I was testing Redis latency using m3.medium EC2 instances. I was able to replicate the usual latency spikes during BGSAVE, when the process forks, and the child starts saving the dataset on disk. However something was not as expected. The spike did not happened because of disk I/O, nor during the fork() call itself. The test was performed with a 1GB of data in memory, with 150k writes per second originating from a different EC2 instance, targeting 5 million keys (evenly distributed). The pipeline was set to 4 commands. This translates to the following command line of redis-benchmark:
One interesting thing about the Stripe blog post about Redis is that they included latency graphs obtained during their tests. In order to persist on disk Redis requires to call the fork() system call. Usually forking using physical servers, and most hypervisors, is fast even with big processes. However Xen is slow to fork, so with certain EC2 instance types (and other virtual servers providers as well), it is possible to have serious latency spikes every time the parent process forks in order to persist on disk. The Stripe graph is pretty clear in this regard.
Yesterday Stripe engineers wrote a detailed report of why they had an issue with Redis. This is very appreciated. In the Hacker News thread I explained that because now we have diskless replication (http://antirez.com/news/81) now persistence is no longer mandatory for people having a master-slaves replicas set. This changes the design constraints: now that we can have diskless replicas synchronization, it is worth it to better support the Stripe (ex?) use case of replicas set with persistence turned down, in a more safe way. This is a work in progress effort.
Almost a month ago a number of people interested in Redis development met in London for the first Redis developers meeting. We identified together a number of features that are urgent (and are now listed in a Github issue here: https://github.com/antirez/redis/issues/2045), and among the identified issues, there was one that was mentioned multiple times in the course of the day: diskless replication. The feature is not exactly a new idea, it was proposed several times, especially by EC2 users that know that sometimes it is not trivial for a master to provide good performances during slaves synchronization. However there are a number of use cases where you don’t want to touch disks, even running on physical servers, and especially when Redis is used as a cache. Redis replication was, in short, forcing users to use disk even when they don’t need or want disk durability.
Yesterday distributed systems expert Aphyr, posted a tweet about a Redis Sentinel issue experienced by an unknown company (that wishes to remain anonymous): “OH on Redis Sentinel "They kill -9'd the master, which caused a split brain..." “then the old master popped up with no data and replicated the lack of data to all the other nodes. Literally had to restore from backups." OMG we have some nasty bug I thought. However I tried to get more information from Kyle, and he replied that the users actually disabled disk persistence at all from the master process. Yep: the master was configured on purpose to restart with a wiped data set.
The first commit I can find in my git history about Redis Cluster is dated March 29 2011, but it is a “copy and commit” merge: the history of the cluster branch was destroyed since it was a total mess of work-in-progress commits, just to shape the initial idea of API and interactions with the rest of the system. Basically it is a roughly 4 years old project. This is about two thirds the whole history of the Redis project. Yet, it is only today, that I’m releasing a Release Candidate, the first one, of Redis 3.0.0, which is the first version with Cluster support.
Queues are an incredibly useful tool in modern computing, they are often used in order to perform some possibly slow computation at a latter time in web applications. Basically queues allow to split a computation in two times, the time the computation is scheduled, and the time the computation is executed. A “producer”, will put a task to be executed into a queue, and a “consumer” or “worker” will get tasks from the queue to execute them. For example once a new user completes the registration process in a web application, the web application will add a new task to the queue in order to send an email with the activation link. The actual process of sending an email, that may require retrying if there are transient network failures or other errors, is up to the worker.