[This blog post is also experimentally available on Medium: https://medium.com/antirez/a-short-tale-of-a-read-overflow-b9210d339cff] When a long running process crashes, it is pretty uncool. More so if the process happens to take a lot of state in memory. This is why I love web programming frameworks that are able, without major performance overhead, to create a new interpreter and a new state for each page view, and deallocate every resource used at the end of the page generation. It is an inherently more reliable programming paradigm, where memory leaks, descriptor leaks, and even random crashes from time to time do not constitute a serious issue. However system software like Redis is at the other side of the spectrum, a side populated by things that should never crash.
I saw multiple users asking me what is happening with Streams, when they’ll be ready for production uses, and in general what’s the ETA and the plan of the feature. This post will attempt to clarify a bit what comes next. To start, in this moment Streams are my main priority: I want to finish this work that I believe is very useful in the Redis community and immediately start with the Redis Cluster improvements plans. Actually the work on Cluster has already started, with my colleague Fabio Nicotra that is porting redis-trib, the Cluster management tool, inside the old and good redis-cli. This step involves translating the code from Ruby to C. In the meantime, a few weeks ago I finished writing the Streams core, and I deleted the “streams” feature branch, merging everything into the “unstable” branch.
Four days ago a user posted a critical issue in the Redis Github repository. The problem was related to the new Redis 4.0 PSYNC2 replication protocol, and was very critical. PSYNC2 brings a number of good things to Redis replication, including the ability to resynchronize just exchanging the differences, and not the whole data set, after a failover, and even after a slave controlled restart. The problem was about this latter feature: with PSYNC2 the RDB file is augmented with replication information. After a slave is restarted, the replication metadata is loaded back, and the slave is able to perform a PSYNC attempt, trying to handshake with the master and receive the differences since the last disconnection.
Until a few months ago, for me streams were no more than an interesting and relatively straightforward concept in the context of messaging. After Kafka popularized the concept, I mostly investigated their usefulness in the case of Disque, a message queue that is now headed to be translated into a Redis 4.2 module. Later I decided that Disque was all about AP messaging, which is, fault tolerance and guarantees of delivery without much efforts from the client, so I decided that the concept of streams was not a good match in that case.
Today I read an interesting article about how the Wolfenstein 3D game implemented a fade effect using a Linear Feedback Shift Register. Every pixel of the screen is set red in a pseudo random way, till all the screen turns red (or other colors depending on the event happening in the game). The blog post describing the implementation is here and is a nice read: http://fabiensanglard.net/fizzlefade/index.php You may wonder why the original code used a LFSR or why I'm proposing a different approach, instead of the vanilla setPixel(rand(),rand()): doing this with a pseudo random generator, as noted in the blog post, is slow, but is also visually very unpleasant, since the more red pixels you have on the screen already, the less likely is that you hit a new yet-not-red pixel, so the final pixels take forever to turn red (I *bet* that many readers of this blog post tried it in the old times of the Spectum, C64, or later with QBASIC or GWBasic). In the final part of the blog post the author writes:
A 10x programmer is, in the mythology of programming, a programmer that can do ten times the work of another normal programmer, where for normal programmer we can imagine one good at doing its work, but without the magical abilities of the 10x programmer. Actually to better characterize the “normal programmer” it is better to say that it represents the one having the average programming output, among the programmers that are professionals in this discipline. The programming community is extremely polarized about the existence or not of such a beast: who says there is no such a thing as the 10x programmer, who says it actually does not just exist, but there are even 100x programmers if you know where to look for.
After 10 million of units sold, and practically an endless set of different applications and auxiliary devices, like sensors and displays, I think it’s deserved to say that the Raspberry Pi is not just a success, it also became one of the preferred platforms for programmers to experiment in the embedded space. Probably with things like the Pi zero, it is also becoming the platform in order to create hardware products, without incurring all the risks and costs of designing, building, and writing software for vertical devices.
It’s not yet stable but it’s soon to become, and comes with a long list of things that will make Redis more useful for we users: finally Redis 4.0 Release Candidate 1 is here, and is bold enough to call itself 4.0 instead of 3.4. For me semantic versioning is not a thing, what I like instead is try to communicate, using version numbers and jumps, what’s up with the new version, and in this specific case 4.0 means “this is the shit”. It’s just that Redis 4.0 has a lot of things that Redis should have had since ages, in a different world where one developer can, like Ken The Warrior, duplicate itself in ten copies and start to code. But it does not matter how hard I try to learn about new vim shortcuts, still the duplicate-me thing is not in my chords.
Redis is often used for caching, in a setup where a fixed maximum memory to use is specified. When new data arrives, we need to make space by removing old data. The efficiency of Redis as a cache is related to how good decisions it makes about what data to evict: deleting data that is going to be needed soon is a poor strategy, while deleting data that is unlikely to be requested again is a good one. In other terms every cache has an hits/misses ratio, which is, in qualitative terms, just the percentage of read queries that the cache is able to serve. Accesses to the keys of a cache are not distributed evenly among the data set in most workloads. Often a small percentage of keys get a very large percentage of all the accesses. Moreover the access pattern often changes over time, which means that as time passes certain keys that were very requested may no longer be accessed often, and conversely, keys that once were not popular may turn into the most accessed keys.
WARNING: Long pretty useless blog post. TLDR is that I wrote, just for fun, a text editor in less than 1000 lines of code that does not depend on ncurses and has support for syntax highlight and search feature. The code is here: http://github.com/antirez/kilo. Screencast here: https://asciinema.org/a/90r2i9bq8po03nazhqtsifksb For the sentimentalists, keep reading… A couple weeks ago there was this news about the Nano editor no longer being part of the GNU project. My first reaction was, wow people still really care about an old editor which is a clone of an editor originally part of a terminal based EMAIL CLIENT. Let’s say this again, “email client”. The notion of email client itself is gone at this point, everything changed. And yet I read, on Hacker News, a number of people writing how they were often saved by the availability of nano on random systems, doing system administrator tasks, for example. Nano is also how my son wrote his first program in C. It’s an acceptable experience that does not require past experience editing files.
I’m spending days trying to get a couple of APIs right. New APIs about modules, and a new Redis data type. I really mean it when I say *days*, just for the API. Writing drafts, starting the implementation shaping data structures and calls, and then restarting from scratch to iterate again in a better way, to improve the design and the user facing part. Why I do that, delaying features for weeks? Is it really so important? Programmers are engineers, maybe they should just adapt to whatever API is better to export for the system exporting it.
It was a matter of time but it eventually happened. In the Redis 1.0 release notes, 7 years ago, I mentioned that one of the interesting features for the future was “loadable modules”. I was really interested in such a feature back then, but over the years I became more and more skeptic about the idea of adding loadable modules in Redis. And probably for good reasons. Modules can be the most interesting feature of a system and the most problematic one at the same time: API incompatibilities between versions, low quality modules crashing the system, a lack
I’m aboard of a flight bringing me to San Francisco. Eventually I purchased the slowest internet connection of my life (well at least for a good reason), but for several hours I was without internet, as usually when I fly. I don’t mind staying disconnected for some time usually. It’s a good time to focus, write some code, or a blog post like this one. However when I’m disconnected, what makes the most difference is not Facebook or Twitter or Github, but the lack of text messages. At this point text messages are a fundamental thing in my life. They are also probably the main source of distraction. I use messages to talk with my family, even just to communicate between different floors. I use messages with friends to organize dinners and vacations. I even use messages with the plumber or the doctor.
It took more than expected, but finally we have it, Redis 3.2.0 stable is out with changes that may be useful to a big number of Redis users. At this point I covered the changes multiple time, but the big ones are: * The GEO API. Index whatever you want by latitude and longitude, and query by radius, with the same speed and easy of use of the other Redis data structures. Here you can find the API documentation: http://redis.io/commands/#geo. Thank you to Matt Stancliff for the initial implementation, that was reworked but is still at the core of the GEO API, and to the developers of ARDB for providing the geo indexing code that Matt used.
Today Redis is 7 years old, so to commemorate the event a bit I passed the latest couple of days doing a fun coding marathon to implement a new crazy command called BITFIELD. The essence of this command is not new, it was proposed in the past by me and others, but never in a serious way, the idea always looked a bit strange. We already have bit operations in Redis: certain users love it, it’s a good way to represent a lot of data in a compact way. However so far we handle each bit separately, setting, testing, getting bits, counting all the bits that are set in a range, and so forth.
Yesterday night I was re-reading Redlock analysis Martin Kleppmann wrote (http://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html). At some point Martin wonders if there is some good way to generate monotonically increasing IDs with Redis. This apparently simple problem can be more complex than it looks at a first glance, considering that it must ensure that, in all the conditions, there is a safety property which is always guaranteed: the ID generated is always greater than all the past IDs generated, and the same ID cannot be generated multiple times. This must hold during network partitions and other failures. The system may just become unavailable if there are less than the majority of nodes that can be reached, but never provide the wrong answer (note: as we'll see this algorithm has another liveness issue that happens during high load of requests).
Martin Kleppmann, a distributed systems researcher, yesterday published an analysis of Redlock (http://redis.io/topics/distlock), that you can find here: http://martin.kleppmann.com/2016/02/08/how-to-do-distributed-locking.html Redlock is a client side distributed locking algorithm I designed to be used with Redis, but the algorithm orchestrates, client side, a set of nodes that implement a data store with certain capabilities, in order to create a multi-master fault tolerant, and hopefully safe, distributed lock with auto release capabilities.
Today I’m happy to announce that the first release candidate for Disque 1.0 is available. If you don't know what Disque is, the best starting point is to read the README in the Github project page at http://github.com/antirez/disque. Disque is a just piece of software, so it has a material value which can be zero or more, depending on its ability to make useful things for people using it. But for me there is an huge value that goes over what Disque, materially, is. It is the value of designing and doing something you care about. It’s the magic of programming: where there was nothing, now there is something that works, that other people may potentially analyze, run, use.
Two days ago Mike Malone published an interesting post on Medium about the V8 implementation of Math.random(), and how weak is the quality of the PRNG used: http://bit.ly/1SPDraN. The post was one of the top news on Hacker News today. It’s pretty clear and informative from the point of view of how Math.random() is broken and how should be fixed, so I’ve nothing to add to the matter itself. But since the author discovered the weakness of the PRNG in the context of generating large probably-non-colliding IDs, I want to share with you an alternative that I used multiple times in the past, which is fast and extremely reliable.
Today I was curious about plotting all the Redis commits we have on Git, which are 90% of all the Redis commits. There was just an initial period where I used SVN but switched very soon. Full size image here: http://antirez.com/misc/commitsvis.png Each commit is a rectangle. The height is the number of affected lines (a logarithmic scale is used). The gray labels show release tags. There are little surprises since the amount of commit remained pretty much the same over the time, however now that we no longer backport features back into 3.0 and future releases, the rate at which new patchlevel versions are released diminished.
Lua scripting is probably the most successful Redis feature, among the ones introduced when Redis was already pretty popular: no surprise that a few of the things users really want are about scripting. The following two features were suggested multiple times over the last two years, and many people tried to focus my attention into one or the other during the Redis developers meeting, a few weeks ago. 1. A proper debugger for Redis Lua scripts. 2. Replication, and storage on the AOF, of Lua scripts as a set of write commands materializing the *effects* of the script, instead of replicating the script itself as we normally do.
IMPORTANT EDIT: Redis 3.2 security improved by implementing protected mode. You can find the details about it here: https://www.reddit.com/r/redis/comments/3zv85m/new_security_feature_redis_protected_mode/ From time to time I get security reports about Redis. It’s good to get reports, but it’s odd that what I get is usually about things like Lua sandbox escaping, insecure temporary file creation, and similar issues, in a software which is designed (as we explain in our security page here http://redis.io/topics/security) to be totally insecure if exposed to the outside world.
I’m just back from the Redis Dev meeting 2015. We spent two incredible days talking about Redis internals in many different ways. However while I’m waiting to receive private notes from other attenders, in order to summarize in a blog post what happened and what were the most important ideas exposed during the meetings, I’m going to touch a different topic here. I took the non trivial decision to move the Redis mailing list, consisting of 6700 members, to Reddit. This looks like a crazy ideas probably in some way, and “to move” is probably not the right verb, since the ML will still exist. However it will only be used in order to receive announcements of new releases, critical informations like security related ones, and from time to time, links to very important discussions that are happening on Reddit.
If you know me, you know I’m not the kind of guy that considers competing products a bad thing. I actually love the users to have choices, so I rarely do anything like comparing Redis with other technologies. However it is also true that in order to pick the right solution users must be correctly informed. This post was triggered by reading a blog post published by Mike Perham, that you may know as the author of a popular library called Sidekiq, that happens to use Redis as backend. So I would not consider Mike a person which is “against” Redis at all. Yet in his blog post that you can find at the URL http://www.mikeperham.com/2015/09/24/storing-data-with-redis/ he states that, for caching, “you should probably use Memcached instead [of Redis]”. So Mike simply really believes Redis is not good for caching, and he arguments his thesis in this way:
Everybody knows Redis is single threaded. The best informed ones will tell you that, actually, Redis is *kinda* single threaded, since there are threads in order to perform certain slow operations on disk. So far threaded operations were so focused on I/O that our small library to perform asynchronous tasks on a different thread was called bio.c: Background I/O, basically. However some time ago I opened an issue where I promised a new Redis feature that many wanted, me included, called “lazy free”. The original issue is here: https://github.com/antirez/redis/issues/1748.
Yesterday Amplitude published an article about scaling analytics, in the context of using the Set data type. The blog post is here: https://amplitude.com/blog/2015/08/25/scaling-analytics-at-amplitude/ On Hacker News people asked why not using Redis instead: https://news.ycombinator.com/item?id=10118413 Amplitude developers have their set of reasons for not using Redis, and in general if you have a very specific problem and want to scale it in the best possible way, it makes sense to implement your vertical solution. I’m not adverse to reinventing the wheel, you want your very specific wheel sometimes, that a general purpose system may not be able to provide. Moreover creating your solution gives you control on what you did, boosts your creativity and your confidence in what you, as a developer can do, makes you able to debug whatever bug may arise in the future without external help.
I consider myself very lucky for contributing to the open source. For me OSS software is not just a license: it means transparency in the development process, choices that are only taken in order to improve software from the point of view of the users, documentation that attempts to cover everything, and simple, understandable systems. The Redis community had the privilege of finding in Pivotal, and VMware before, a company that thinks at open source in the same way as we, the community of developers, think of it.
Nor subjects, for what matters. Everybody will tell you to don't add a dot at the end of the first line of a commit message. I followed the advice for some time, but I'll stop today, because I don't believe commit messages are titles or subjects. They are synopsis of the meaning of the change operated by the commit, so they are small sentences. The sentence can be later augmented with more details in the next lines of the commit message, however many times there is *no* body, there is just the first line. How many emails or articles you see with just the subject or the title? Very little, I guess. So for me it is like:
I’m back from Paris, DotScale 2015 was a very interesting conference. Before leaving I was working on Sentinel in the context of the unstable branch: the work was mainly about connection sharing. In short, it is the ability of a few Sentinels to scale, monitoring many masters. Before to leave, and now that I’m back, I tried to “secure” a set of features that will be the basis for Redis 3.2. In the next weeks I’ll be focusing developing these features, so I thought it’s worth to share the list with you ASAP.
EDIT: In case you missed it, Disque source code is now available at http://github.com/antirez/disque It is a few months that I spend ~ 15-20% of my time, mostly hours stolen to nights and weekends, working to a new system. It’s a message broker and it’s called Disque. I’ve an implementation of 80% of what was in the original specification, but still I don’t feel like it’s ready to be released. Since I can’t ship, I’ll at least blog… so that’s the story of how it started and a few details about what it is.
I’m back home, after a non easy trip, since to travel from San Francisco to Sicily is kinda NP complete: there are no solutions involving less than three flights. However it was definitely worth it, because the Redis Conference 2015 was very good, SF was wonderful as usually and I was able to meet with many interesting people. Here I’ll limit myself to writing a short account of the conference, but the trip was also an incredible experience because I discovered old and new friends, that are not just smart programmers, but also people I could imagine being my friends here in Sicily. I never felt alone while I was 10k kilometers away from my home.
Today Redis is six years old. This is an incredible accomplishment for me, because in the past I switched to the next thing much faster. There are things that lasted six years in my past, but not like Redis, where after so much time, I still focus most of my everyday energies into. How did I stopped doing new things to focus into an unique effort, drastically monopolizing my professional life? It was a too big sacrifice to do, for an human being with a limited life span. Fortunately I simply never did this, I never stopped doing new things.
Redis speed could be one selling point for new users, so following the trend of comparative “advertising” it should be logical to have a few comparisons at Redis.io. However there are two problems with this. One is of goals: I don’t want to convince developers to adopt Redis, we just do our best in order to provide a suitable product, and we are happy if people can get work done with it, that’s where my marketing wishes end. There is more: it is almost always impossible to compare different systems in a fair way.
Today I was testing Redis latency using m3.medium EC2 instances. I was able to replicate the usual latency spikes during BGSAVE, when the process forks, and the child starts saving the dataset on disk. However something was not as expected. The spike did not happened because of disk I/O, nor during the fork() call itself. The test was performed with a 1GB of data in memory, with 150k writes per second originating from a different EC2 instance, targeting 5 million keys (evenly distributed). The pipeline was set to 4 commands. This translates to the following command line of redis-benchmark:
One interesting thing about the Stripe blog post about Redis is that they included latency graphs obtained during their tests. In order to persist on disk Redis requires to call the fork() system call. Usually forking using physical servers, and most hypervisors, is fast even with big processes. However Xen is slow to fork, so with certain EC2 instance types (and other virtual servers providers as well), it is possible to have serious latency spikes every time the parent process forks in order to persist on disk. The Stripe graph is pretty clear in this regard.
Yesterday Stripe engineers wrote a detailed report of why they had an issue with Redis. This is very appreciated. In the Hacker News thread I explained that because now we have diskless replication (http://antirez.com/news/81) now persistence is no longer mandatory for people having a master-slaves replicas set. This changes the design constraints: now that we can have diskless replicas synchronization, it is worth it to better support the Stripe (ex?) use case of replicas set with persistence turned down, in a more safe way. This is a work in progress effort.
Almost a month ago a number of people interested in Redis development met in London for the first Redis developers meeting. We identified together a number of features that are urgent (and are now listed in a Github issue here: https://github.com/antirez/redis/issues/2045), and among the identified issues, there was one that was mentioned multiple times in the course of the day: diskless replication. The feature is not exactly a new idea, it was proposed several times, especially by EC2 users that know that sometimes it is not trivial for a master to provide good performances during slaves synchronization. However there are a number of use cases where you don’t want to touch disks, even running on physical servers, and especially when Redis is used as a cache. Redis replication was, in short, forcing users to use disk even when they don’t need or want disk durability.
Yesterday distributed systems expert Aphyr, posted a tweet about a Redis Sentinel issue experienced by an unknown company (that wishes to remain anonymous): “OH on Redis Sentinel "They kill -9'd the master, which caused a split brain..." “then the old master popped up with no data and replicated the lack of data to all the other nodes. Literally had to restore from backups." OMG we have some nasty bug I thought. However I tried to get more information from Kyle, and he replied that the users actually disabled disk persistence at all from the master process. Yep: the master was configured on purpose to restart with a wiped data set.
The first commit I can find in my git history about Redis Cluster is dated March 29 2011, but it is a “copy and commit” merge: the history of the cluster branch was destroyed since it was a total mess of work-in-progress commits, just to shape the initial idea of API and interactions with the rest of the system. Basically it is a roughly 4 years old project. This is about two thirds the whole history of the Redis project. Yet, it is only today, that I’m releasing a Release Candidate, the first one, of Redis 3.0.0, which is the first version with Cluster support.
Queues are an incredibly useful tool in modern computing, they are often used in order to perform some possibly slow computation at a latter time in web applications. Basically queues allow to split a computation in two times, the time the computation is scheduled, and the time the computation is executed. A “producer”, will put a task to be executed into a queue, and a “consumer” or “worker” will get tasks from the queue to execute them. For example once a new user completes the registration process in a web application, the web application will add a new task to the queue in order to send an email with the activation link. The actual process of sending an email, that may require retrying if there are transient network failures or other errors, is up to the worker.
----------------- UPDATE: The algorithm is now described in the Redis documentation here => http://redis.io/topics/distlock. The article is left here in its older version, the updates will go into the Redis documentation instead. ----------------- Many people use Redis to implement distributed locks. Many believe that this is a great use case, and that Redis worked great to solve an otherwise hard to solve problem. Others believe that this is totally broken, unsafe, and wrong use case for Redis.
The strong reactions about the recent OpenSSL bug are understandable: it is not fun when suddenly all the internet needs to be patched. Moreover for me personally how trivial the bug is, is disturbing. I don’t want to point the finger to the OpenSSL developers, but you just usually think at those class of issues as a bit more subtle, in the case of a software like OpenSSL. Usually you fail to do sanity checks *correctly*, as opposed to this bug where there is a total *lack* of bound checks in the memcpy() call.
Generally speaking, I love randomized algorithms, but there is one I love particularly since even after you understand how it works, it still remains magical from a programmer point of view. It accomplishes something that is almost illogical given how little it asks for in terms of time or space. This algorithm is called HyperLogLog, and today it is introduced as a new data structure for Redis. Counting unique things === Usually counting unique things, for example the number of unique IPs that connected today to your web site, or the number of unique searches that your users performed, requires to remember all the unique elements encountered so far, in order to match the next element with the set of already seen elements, and increment a counter only if the new element was never seen before.
Yesterday and today I managed to spend some time with linenoise (http://github.com/antirez/linenoise), a minimal line-editing library designed to be a simple and small replacement for readline. I was trying to merge a few pull requests, to fix issues, and doing some refactoring at the same time. It was some kind of nirvana I was feeling: a complete control of small, self-contained, and useful code. There is something special in simple code. Here I’m not referring to simplicity to fight complexity or over engineering, but to simplicity per se, auto referential, without goals if not beauty, understandability and elegance.
The title of this blog post is an apparently trivial to answer question, however it is worth to consider a bit better what performance really means: it is easy to get confused between scalability and performance, and to decompose performance, in the specific case of database systems, in its different main components, may not be trivial. In this short blog post I’ll try to write down my current idea of what performance is in the context of database systems. A good starting point is probably the first slide I use lately in my talks about Redis. This first slide is indeed about performance, and says that performance is mainly three different things.
Today Redis is 5 years old, at least if we count starting from the initial HN announcement , that’s actually a good starting point. After all an open source project really exists as soon as it is public. I’m a bit shocked I worked for five years straight to the same thing. The opportunities for learning new things I had because of the directions where Redis pushed me, and the opportunities to learn new things that I missed because I had almost consistently no time for random hacking, are huge.
In this blog post I’m going to describe a very simple distributed algorithm that is useful in different programming scenarios. The algorithm is useful when you need to take some kind of information synchronized among a number of processes. The information can be everything as long as it is composed of a small number of bytes, and as long as it is idempotent, that is, the current value of the information does not depend on the previous value, and we can just replace an old value, with the new one.
Redis Cluster is finally on its road to reach the first stable release in a short timeframe as already discussed in the Redis google group . However despite a design never proposed for the implementation of Redis Cluster was analyzed and discussed at long in the past weeks (unfortunately creating some confusion: many people, including notable personalities of the NoSQL movement, confused the analyzed proposal with Redis Cluster implementation), no attempt was made to analyze or categorize Redis Cluster itself.
One of the steps to reach the goal of providing a "testable" Redis Cluster experience to users within a few weeks, is some serious testing that goes over the usual "I'm running 3 nodes in my macbook, it works". Finally this is possible, since Redis Cluster entered into the "refinements" stage, and most of the system design and implementation is in its final form already. In order to perform some testing I assembled an environment like that: * Hardware: 6 real computers: 2 macbook pro, 2 macbook air, 1 Linux desktop, 1 Linux tiny laptop called EEEpc running with a single core at 800Mhz.
So finally something really good happened from the Redis criticism thread. At the end of the work day I was reading about Redis as AP and merge operations on Twitter. At the same time I was having a private email exchange with Alexis Richardson (from RabbitMQ, and, my boss). Alexis at some point proposed that perhaps a way to improve safety was to asynchronously ACK the client about what commands actually were not received so that the client could retry. This seemed a lot of efforts in the client side, but somewhat totally opened my view on the matter.
A few days ago I tried to do an experiment by running some kind of “call for critiques” in the Redis mailing list: https://groups.google.com/forum/#!topic/redis-db/Oazt2k7Lzz4 The thread has reached 89 posts so far, probably one of the biggest threads in the history of the Redis google group. The main idea was that critiques are a mix of pointless attacks, and truth, so to extract the truth from critiques can be a good exercise, it means to have some seed idea for future improvements from the part of the population that is not using or is not happy with your system.
Redis unstable has a new command called "WAIT". Such a simple name, is indeed the incarnation of a simple feature consisting of less than 200 lines of code, but providing an interesting way to change the default behavior of Redis replication. The feature was extremely easy to implement because of previous work made. WAIT was basically a direct consequence of the new Redis replication design (that started with Redis 2.8). The feature itself is in a form that respects the design of Redis, so it is relatively different from other implementations of synchronous replication, both at API level, and from the point of view of the degree of consistency it is able to ensure.
Yesterday I lost all my blog data in a rather funny way. When I installed this new blog engine, that is basically a Lamer News slightly modified to serve as a blog, I spinned a Redis instance manually with persistence *disabled* just to see if it was working and test it a bit. I just started a screen instance, and run something like ./redis-server --port 10000. Since this is equivalent to an empty config file with just "port 10000" inside I was running no disk backed at all. Since Redis very rarely crashes, guess what, after more than one year it was still running inside the screen session, and I totally forgot that it was running like that, happily writing controversial posts in my blog. Yesterday my server was under attack. This caused an higher then normal load, and Linode rebooted the instance. As a result my blog was gone.
Today Joyent wrote a blog post in the company blog about an issue that started with this pull request in the libuv project: https://github.com/joyent/libuv/pull/1015#issuecomment-29538615 Basically the developer Ben Noordhuis rejected a pull request involving a change in the documentation to use gender-neutral form instead of “him”. Joyent replied with this incredible post: http://www.joyent.com/blog/the-power-of-a-pronoun. In the blog post you can read: “But while Isaac is a Joyent employee, Ben is not—and if he had been, he wouldn't be as of this morning: to reject a pull request that eliminates a gendered pronoun on the principle that pronouns should in fact be gendered would constitute a fireable offense for me and for Joyent.”
Redis API for data access is usually limited, but very direct and straightforward. It is limited because it only allows to access data in a natural way, that is, in a data structure obvious way. Sorted sets are easy to access by score ranges, while hashes by field name, and so forth. This API “way” has profound effects on what Redis is and how users organize data into it, because an API that is data-obvious means fast operations, less code and less bugs in the implementation, but especially forcing the application layer to make meaningful choices: the database as a system in which you are responsible of organizing data in a way that makes sense in your application, versus a database as a magical object where you put data inside, and then it will be able to fetch and organize data for you in any format.
This blog post describes the new algorithm used in Redis Cluster in order to propagate and update metadata, that is hopefully significantly safer than the previous algorithm used. The Redis Cluster specification was not yet updated, as I'm rewriting it from scratch, so this blog post serves as a first way to share the algorithm with the community. Let's start with the problem to solve. Redis Cluster uses a master - slave design in order to recover from nodes failures. The key space is partitioned across the different masters in the cluster, using a concept that we call "hash slots". Basically every key is hashed into a number between 0 and 16383. If a given key hashes to 15, it means it is in the hash slot number 15. These 16k hash slots are split among the different masters.
Paul Graham managed to put a very important question, the one of the English language as a requirement for IT workers, in the attention zone of news sites and software developers . It was a controversial matter as he referred to "foreign accents" and the internet is full of people that are just waiting to overreact, but this is the least interesting part of the question, so I'll skip that part. The important part is, no one talks about the "English problem" usually, and I always felt a bit alone in that side, like if it was a problem only affecting me, so in this blog post I want to share my experience about English.
Twilio just released a post mortem about an incident that caused issues with the billing system: http://www.twilio.com/blog/2013/07/billing-incident-post-mortem.html The problem was about a Redis server, since Twilio is using Redis to store the in-flight account balances, in a master-slaves setup, with multiple slaves in different data centers for obvious availability and data safety concerns. This is a short analysis of the incident, what Twilio can do and what Redis can do to avoid this kind of issues.
Yesterday night I returned back home after a short trip in San Francisco. Before memory fades out and while my feelings are crisp enough, I'm writing a short report of the trip. The point of view is that of a south European programmer exposed for a few days to what is probably the most active information technology ecosystem and economy of the world. Reaching San Francisco === If you want to reach San Francisco from Sicily, there are no direct flights helping you. My flight was a Lufthansa flight from Catania to Munich, and finally from Munich to San Francisco. This is a total of 15 hours flight, plus the stop in Munich waiting for the second flight.
Redis uses streamed asynchronous replication, that's one of the simplest forms of replication you can imagine: a continuos stream of writes is sent to the slaves, without waiting for the slaves to process the writes in any way before replying to the client. I always gave that almost for granted, as I always assumed Redis was not a good match for synchronous replication, that has an higher latency. However recently I tried to fix another issue with Redis replication, that is, timeouts are all up to the slave.
Terah is a planet far away, where networks never split. They have a single issue with their computer networks, from time to time, single hosts break in a way or the other. Sometimes is a broken power supply, other times a crashed disk, or a software issue completely blocking the system. The inhabitants of this strange planet use two database systems. One is imported from planet Earth via the Galactic Exchange Program, and is called EDB. The other is produced by engineers from Terah, and is called TDB. The databases are functionally equivalent, but they have different semantics when a network partition happens. While the database from Earth stops accepting writes as long as it is not connected with the majority of the other database nodes, the database from Terah works as long as the majority of the clients can reach at least a database node (incidentally, the author of this story released a similar software project called Sentinel, but this is just a coincidence).
In a great series of articles Kyle Kingsbury, aka @aphyr on Twitter, attacked a number of data stores:  http://aphyr.com/tags/jepsen Postgress, Redis Sentinel, MongoDB, and Riak are audited to find what happens during network partitions and how these systems can provide the claimed guarantees. Redis is attacked here: http://aphyr.com/posts/283-call-me-maybe-redis I said that Kyle "attacked" the systems on purpose, as I see a parallel with the world of computer security here, it is really a good idea to move this paradigm to the database world, to show failure modes of systems against the claims of vendors. Similarly to what happens in the security world the vendor may take the right steps to fix the system when possible, or simply the user base will be able to recognize that under certain circumstances something bad is going to happen with your data.
Lately I'm trying to push forward Redis 2.8 enough to reach the feature freeze and release it as a stable release as soon as possible. Redis 2.8 will not contain Redis Cluster, and its implementation of Redis Sentinel is the same as 2.6 and unstable branches, (Sentinel is taken mostly in sync in all the branches being fundamentally a different project using Redis just as framework). However there are many new interesting features in Redis 2.8 that are back ported from the unstable branch. Basically 2.8 it's our usual "in the middle" release, like 2.4 was: waiting for Redis 3.0 that will feature Redis Cluster (we have great progresses about it! See https://vimeo.com/63672368), we'll have a 2.8 release with everything that is ready to be released into the unstable branch. The goal is of course to put more things in the hands of users ASAP.
Questo post ha lo scopo di presentare alla comunita' italiana interessata ai temi della programmazione e delle startup un progetto nato attorno ad un paio di birre: "Hacking Italia", che trovate all'indirizzo http://hackingitalia.com Hacking Italia e' un sito di "social news", molto simile ad Hacker News, il celebre collettore di news per hacker di YCombinator. A che serve un sito italiano, e in italiano se c'e' gia' molto di piu' e di meglio nel panorama internazionale? A mettere assieme una massa critica di persone "giuste" in Italia.
Hello! As promised today I did some SSD testing. The setup: a Linux box with 24 GB of RAM, with two disks. A) A spinning disk. b) An SSD (Intel 320 series). The idea is, what happens if I set the SSD disk partition as a swap partition and fill Redis with a dataset larger than RAM? It is a lot of time I want to do this test, especially now that Redis focus is only on RAM and I abandoned the idea of targeting disk for a number of reasons. I already guessed that the SSD swap setup would perform in a bad way, but I was not expecting it was *so bad*.
One thing, more than everything else, keeps me focused while programming: never interrupt the flow. If you ever wrote some complex piece of code you know what happens after some time: your mental model of the software starts to be very complex with different ideas nested inside other ideas, like the structure of your program is, after all. So while you are writing this piece of code, you realize that because of it you need to do that other change. Something like "I'm freeing this object here, but it's connected to this two other objects and I need to do this and that in order to ensure consistent state".
After the "sexism gate" I started to use my Twitter account only for private stuff in order to protect the image of Redis and/from my freedom to say whatever I want. It did not worked actually since the reality is that people continue to address you with at-messages about Redis stuff. But the good outcome is that now I created a @redisfeed account that I use in order to provide a stream of information to Redis users that are not interested in my personal tweets not related to Redis. Anyway when I say some important thing regarding Redis with my personal account, I just retweet in the other side, so this is a good setup.
This is a very busy moment for Redis because the new year started in a very interesting way: 1) I finished the Partial Resynchronization patch (aka PSYNC) and merged it into the unstable and 2.8 branch. You can read more about it here: http://antirez.com/news/47 2) We finally have keyspace changes notifications: http://redis.io/topics/notifications Everything is already merged into our development branches, so the deal is closed, and Redis 2.8 will include both the features. I'm especially super excited about PSYNC, as this is a not-feature, simply you don't have to deal with it, the only change is that slaves work A LOT better. I love adding stuff that is transparent for users, just making the system better and more robust.
For a decade and half I contributed to open source regularly, and still it is relatively rare that I stop to think a bit more about what this means for me. Probably it is just because I like to write code, so this is how I use my time: writing code instead of thinking about what this means… however lately I'm starting to have a few recurring ideas about open source, its relationship with the IT industry, and my interpretation of what OSS is, for me, as a developer. First of all, open source for me is not a way to contribute to the free software movement, but to contribute to humanity. This means a lot of things, for instance I don't care about what people do with my code, nor if they'll release back their modifications. I simply want people to use my code in one way or the other.
Dear Redis users, in the final part of 2012 I repeated many time that the focus, for 2013, is all about Redis Cluster and Redis Sentinel. This is exactly what I'm going to do from the point of view of the big picture, however there are many smaller features that make a big difference from the point of view of the Redis user day to day operations. Such features can't be ignored as well. They are less shiny in a feature list, and they are not good to generate buzz and interest in new users, and sometimes boring to code, but they are very important from a practical point of view.
# Software defined radio is cool About one week ago I received my RTLSDR dongle, entering the already copious crew of software defined radio enthusiasts. It's really a lot of fun, for instance from my home that is at about 10 km from the Catania Airport I can listen the tower talking with the aircrafts in the 118.700 Mhz frequency with AM modulation, however because of lack of time I was not able to explore this further until the past Sunday. My Sunday goal was to use the RTLSDR to see if I was able to capture some ADS-B message from the aircrafts lading or leaving from the airport. Basically ADS-B is a security device that is installed in most aircrafts that is used for collision avoidance and other stuff like this. Every aircraft broadcasts informations about heading, speed, altitude and so forth.
Currently I'm working on Redis partial resynchronization of slaves as I wrote in previous blog posts. The idea is that we have a backlog of the replication stream, up to the specified amount of bytes (this will be in the order of a few megabytes by default). If a slave lost the connection, it connects again, see if the master RUNID is the same, and asks to continue from a given offset. If this is possible, we continue, nothing is lost, and a full resynchronization is not needed. Otherwise if the offset is about data we no longer have in the backlog, we full resync.
While a big number of users use large farms of Redis nodes, from the point of view of the project itself currently Redis is a mostly single-instance business. I've big plans about going distributed with the project, to the extent that I'm no longer evaluating any threaded version of Redis: for me from the point of view of Redis a core is like a computer, so that scaling multi core or on a cluster of computers is the same conceptually. Multiple instances is a share-nothing architecture. Everything makes sense AS LONG AS we have a *credible way* to shard :-)
Premise: a small rant about software reliability. === I'm very serious about software reliability, and this is not just a good thing. It is good in a sense, as I tend to work to ensure that the software I release is solid. At the same time I think I take this issue a bit too personally: I get upset if I receive a crash report that I can't investigate further for some reason, or that looks like almost impossible to me, or with an unhelpful stack trace. Guess what? This is a bad attitude because to deliver bugs free software is simply impossible. We are used to think in terms of labels: "stable", "production ready", "beta quality". I think that these labels are actually pretty misleading if not put in the right perspective.
This is one of this trivial changes in Redis that can make a big difference for users. Basically in the unstable branch I added some code that has the following effect, when running Redis on Linux systems:  19 Nov 12:00:55.019 * Background saving started by pid 391  19 Nov 12:01:00.663 * DB saved on disk  19 Nov 12:01:00.673 * RDB: 462 MB of memory used by copy-on-write As you can see now the amount of additional memory used by the saving child is reported (it is also reported for AOF rewrite operations).
Memory errors in computers are so common that you can register domain names similar to very famous domain names, but altered in one bit, and get a number of requests per hour: http://dinaburg.org/bitsquatting.html
On Twitter: @War3zRub1 "Hahaha it's silly how people use Redis when they need a reverse proxy" @C4ntc0de "ZOMG! Use a real message queue, Redis is not a queue!" @L4m3tr00l "My face when Redis BLABLABLA..." Meanwhile *at* Twitter: OP1: "Hey guys, there is a spike in the number of lame messages today, load is increasing..." OP2: "Yep noticed, it's the usual troll fiesta trowing shit at Redis, 59482 messages per second right now." OP1: "Ok, no prob, let's spawn two additional Redis nodes to serve their timelines as smooth as usually".
This post by Peter Bailis is a must read. "Safety and Liveness: Eventual consistency is not safe" .  http://www.bailis.org/blog/safety-and-liveness-eventual-consistency-is-not-safe/ An extreme TL;DR of this is. 1) In an eventually consistent system, when all the nodes will agree again after a partition? 2) In an eventually consistent system, HOW the nodes will agree about inconsistencies? 3) In immediately consistent systems, when I'm no longer able to write? When I'm no longer able to read?
There is a new DB option out there, I know it took a long time to be developed. While I don't know very well how it works I hope it will be an interesting player in the database landscape. My initial feeling is that it will compete closely with Riak and MongoDB (the system seems more similar to MongoDB itself, but if it can scale well multi-nodes people that don't need high write availability may pick an immediate consistent database such as RethinkDB instead of Riak for certain applications).
While I consider the Amazon Dynamo design, and its original paper, one of the most interesting things produced in the field of databases in recent times, in the Redis world eventual consistency was never particularly discussed. Redis Cluster for instance is a system biased towards consistency than availability. Redis Sentinel itself is an HA solution with the dogma of consistency and master slave setups. This bias for consistent systems over more available but eventual consistent systems has some good reasons indeed, that I can probably reduce to three main points:
You can follow the commits in the next days in the "psync" branch at github: https://github.com/antirez/redis/commits/psync
From an HN comment "(Geek note: In the late nineties I worked briefly with a D&D fanatic ops team lead. He threw a D100 when he came in every morning. Anything >90 he picked a random machine to failover 'politely'. If he threw a 100 he went to the machine room and switched something off or unplugged something. A human chaos monkey)."  http://news.ycombinator.com/item?id=4736220
I'm pretty surprised no one tried to write a wrapper for redis-rb or other clients implementing a Dynamo-style system on top of Redis primitives. Basically something like that: 1) You have a list of N Redis nodes. 2) On write, use consistent hashing and write the same thing to M nodes (M configurable). 3) On reads, read from M nodes and pick the most common reply to return to the client. For all the non-matching replies, use DUMP / RESTORE in Redis 2.6 to update the value of nodes that are in the minority.
I'm proud to be mentioned in this well-thought and non-bigot post: http://www.nerdess.net/waffling/why-it-awesome-be-girl-tech/ Also in perfect accordance with the hacking culture, the post is sort of an HOWTO for girls that want to be involved in IT.
In this busy days I had the idea to focus on a single, non-huge, self contained project for some time, that could allow me to work focused as much as hours as possible, and could provide a significant advantage to the Redis community. It turns out, the best bet was partial replication resync. An always wanted feature that consists in the ability to a slave to resynchronize to a master without the need of a full resync (and RDB dump creation on the master side) if the replication link was interrupted for a short time, because of a timeout, or a network issue, or similar transient issue.
I love Github issues, it is one of the awesome things at Github IMHO: as simple as possible but actually under the hood pretty full featured. However one of the things I love more is labels. It is a truly powerful thing to organize issues in a project-specific way. Unfortunately if an issue is a pull request, no labels can be attached. I wonder why. Also I would love the ability to merge against multiple branches instead of the taget one, directly from the web UI.
I assume you already read the AWS report about recent troubles. I think it is a very good argument you could use at work against design complexity and in favor of designing stuff that are at a complexity level where analysis of failure modes and prevention is actually possible.  https://aws.amazon.com/message/680342/
From a comment on Hacker News: (link: http://news.ycombinator.com/item?id=4705387) --- quoted comment --- Full disclosure: I work for an AWS competitor. While none of the specific AWS systemic failures may themselves be foreseeable, it is not true that issues of this nature cannot be anticipated: the architecture of their system (and in particular, their insistence on network storage for local data) allows for cascading failure modes in which single failures blossom to systemic ones. AWS is not the only entity to have made this mistake with respect to network storage in the cloud; I, too, was fooled.
Achievement unlocked: releasing a Redis version the same day your daughter was born ;-) But that was a bad issue as there was a bug preventing compilation on pretty old Linux systems that are still pretty widespread (RHLE5 & similar). Redis 2.6.1 fixes just that issue and is available as usually at http://redis.io as a tar.gz or at github/antirez/redis as a "2.6.1" tag.
I really trust both in the usefulness of Redis bit operations and the fact that our community in the future should have documentation about Redis Patterns. So an article from CopperEgg where a bit operations pattern is described is good for sure :) http://copperegg.com/redis-bit-operations-use-case-at-copperegg/
25 October 2012 01:06, she is 3350 grams of a funny little thing :-)
It's a more quite time now. Redis 2.6 released, the sexism issue almost forgotten. Time to relax, be wise, and focus on work. Right, but, that's not me. I've a few more things to say about what happened, and to reply to the many people that asked me why I felt "obligated" to stop using my Twitter account as before, with a mix of work, thoughts on technology, and personal stuff. I can change idea easily if it is the case, but this time it was not the case. As much as people that criticised me for my blog post may think that I've a problem, I also think they have huge limits. Oh well, different opinions, I don't like you, you don't like me, I don't freaking care after all. I don't think on the same line as most people alive if that's the matter.
h2s writes about Linus: "I love this guy's balanced approach to steering the kernel. Somebody asked whether a bunch of security-related patches would be getting into Linus' tree, and his response was great. Basically, he spent a few minutes explaining how security people tend to think that problems are either security problems or not worth thinking about. They see things in black and white and only care about increasing security at any cost. He said performance fanatics can be the same in their approach to improving performance, and he tries not to treat security or performance patches as being too massively different from any other types of patches such as ones for correctness.
I don't like people that are using recent EC2 problems to get an easy advantage / marketing. Stuff go down and cloud services are not magical, it is better to adjust the expectations. But there are other reasons why people IMHO should consider going bare metal. * EC2 (and similar services) are extremely costly. With 100 euros per month you can rent a beast of a dedicated server with 64 GB of RAM and fast RAID disks. * As you can see you are not down-time safe, and to be down together with a zillion of other sites may be a good excuse with your boss maybe, but does not change your uptime percentage, so it's a poor shield.
Redis 2.6 is finally out and I think that now that we reached this point we'll start to see how the advantages of a release that was already exploited in production by few, will turn into a big advantage for the rest of the community. Scripting, bitops, and all the big features are good additions but my feeling is that Redis 2.6 is especially significative as a step forward in the maturity of the Redis implementation. This does not mean that's bug free, it's new code and we'll likely discover bugs especially in the early days as with every new release that starts to be adopted more and more.
And there is Pieter Noordhuis on stage right now! http://redisconf.com/video/
You are there in the morning with your coffee in front of you, scanning pull requests and bug reports, then you see a conversation around a commit among a few guys that modified the code to make it better, than there is another one suggesting to improve it in another way. You click in the account names and you see this people with their transparent eyes and your trust in humanity is restored.
Takeaways: 1) Making videos is in some way harder than doing a talk live. 2) Screen Flow is awesome, but could be improved with more video editing capabilities, apparently you can't "cut" the video. 3) The problem is to upload files when they are big and you have normal ADSL connection :-) But it feels good to be able to send the video talks a few days in advance, so the conf organizers will be able to perform editing, filter audio if needed, whatever.
of the final recording of the videos I'll send to the Redis Conf. That was hard! The timing of the conf was not excellent for my attending, but producing the video was also less trivial than I thought, but finally I've the slides, an idea about what to say, and the ScreenFlow skills ;) Maybe after this experience I'll produce some video tutorial of Redis new features as I introduce it, in order to accelerate the adoption of new things in our community. Now back to work...