[This blog post is also experimentally available on Medium: https://medium.com/antirez/a-short-tale-of-a-read-overflow-b9210d339cff] When a long running process crashes, it is pretty uncool. More so if the process happens to take a lot of state in memory. This is why I love web programming frameworks that are able, without major performance overhead, to create a new interpreter and a new state for each page view, and deallocate every resource used at the end of the page generation. It is an inherently more reliable programming paradigm, where memory leaks, descriptor leaks, and even random crashes from time to time do not constitute a serious issue. However system software like Redis is at the other side of the spectrum, a side populated by things that should never crash.
News posted by antirez
I saw multiple users asking me what is happening with Streams, when they’ll be ready for production uses, and in general what’s the ETA and the plan of the feature. This post will attempt to clarify a bit what comes next. To start, in this moment Streams are my main priority: I want to finish this work that I believe is very useful in the Redis community and immediately start with the Redis Cluster improvements plans. Actually the work on Cluster has already started, with my colleague Fabio Nicotra that is porting redis-trib, the Cluster management tool, inside the old and good redis-cli. This step involves translating the code from Ruby to C. In the meantime, a few weeks ago I finished writing the Streams core, and I deleted the “streams” feature branch, merging everything into the “unstable” branch.
Four days ago a user posted a critical issue in the Redis Github repository. The problem was related to the new Redis 4.0 PSYNC2 replication protocol, and was very critical. PSYNC2 brings a number of good things to Redis replication, including the ability to resynchronize just exchanging the differences, and not the whole data set, after a failover, and even after a slave controlled restart. The problem was about this latter feature: with PSYNC2 the RDB file is augmented with replication information. After a slave is restarted, the replication metadata is loaded back, and the slave is able to perform a PSYNC attempt, trying to handshake with the master and receive the differences since the last disconnection.
Until a few months ago, for me streams were no more than an interesting and relatively straightforward concept in the context of messaging. After Kafka popularized the concept, I mostly investigated their usefulness in the case of Disque, a message queue that is now headed to be translated into a Redis 4.2 module. Later I decided that Disque was all about AP messaging, which is, fault tolerance and guarantees of delivery without much efforts from the client, so I decided that the concept of streams was not a good match in that case.
Today I read an interesting article about how the Wolfenstein 3D game implemented a fade effect using a Linear Feedback Shift Register. Every pixel of the screen is set red in a pseudo random way, till all the screen turns red (or other colors depending on the event happening in the game). The blog post describing the implementation is here and is a nice read: http://fabiensanglard.net/fizzlefade/index.php You may wonder why the original code used a LFSR or why I'm proposing a different approach, instead of the vanilla setPixel(rand(),rand()): doing this with a pseudo random generator, as noted in the blog post, is slow, but is also visually very unpleasant, since the more red pixels you have on the screen already, the less likely is that you hit a new yet-not-red pixel, so the final pixels take forever to turn red (I *bet* that many readers of this blog post tried it in the old times of the Spectum, C64, or later with QBASIC or GWBasic). In the final part of the blog post the author writes:
A 10x programmer is, in the mythology of programming, a programmer that can do ten times the work of another normal programmer, where for normal programmer we can imagine one good at doing its work, but without the magical abilities of the 10x programmer. Actually to better characterize the “normal programmer” it is better to say that it represents the one having the average programming output, among the programmers that are professionals in this discipline. The programming community is extremely polarized about the existence or not of such a beast: who says there is no such a thing as the 10x programmer, who says it actually does not just exist, but there are even 100x programmers if you know where to look for.
After 10 million of units sold, and practically an endless set of different applications and auxiliary devices, like sensors and displays, I think it’s deserved to say that the Raspberry Pi is not just a success, it also became one of the preferred platforms for programmers to experiment in the embedded space. Probably with things like the Pi zero, it is also becoming the platform in order to create hardware products, without incurring all the risks and costs of designing, building, and writing software for vertical devices.
It’s not yet stable but it’s soon to become, and comes with a long list of things that will make Redis more useful for we users: finally Redis 4.0 Release Candidate 1 is here, and is bold enough to call itself 4.0 instead of 3.4. For me semantic versioning is not a thing, what I like instead is try to communicate, using version numbers and jumps, what’s up with the new version, and in this specific case 4.0 means “this is the shit”. It’s just that Redis 4.0 has a lot of things that Redis should have had since ages, in a different world where one developer can, like Ken The Warrior, duplicate itself in ten copies and start to code. But it does not matter how hard I try to learn about new vim shortcuts, still the duplicate-me thing is not in my chords.
Redis is often used for caching, in a setup where a fixed maximum memory to use is specified. When new data arrives, we need to make space by removing old data. The efficiency of Redis as a cache is related to how good decisions it makes about what data to evict: deleting data that is going to be needed soon is a poor strategy, while deleting data that is unlikely to be requested again is a good one. In other terms every cache has an hits/misses ratio, which is, in qualitative terms, just the percentage of read queries that the cache is able to serve. Accesses to the keys of a cache are not distributed evenly among the data set in most workloads. Often a small percentage of keys get a very large percentage of all the accesses. Moreover the access pattern often changes over time, which means that as time passes certain keys that were very requested may no longer be accessed often, and conversely, keys that once were not popular may turn into the most accessed keys.
WARNING: Long pretty useless blog post. TLDR is that I wrote, just for fun, a text editor in less than 1000 lines of code that does not depend on ncurses and has support for syntax highlight and search feature. The code is here: http://github.com/antirez/kilo. Screencast here: https://asciinema.org/a/90r2i9bq8po03nazhqtsifksb For the sentimentalists, keep reading… A couple weeks ago there was this news about the Nano editor no longer being part of the GNU project. My first reaction was, wow people still really care about an old editor which is a clone of an editor originally part of a terminal based EMAIL CLIENT. Let’s say this again, “email client”. The notion of email client itself is gone at this point, everything changed. And yet I read, on Hacker News, a number of people writing how they were often saved by the availability of nano on random systems, doing system administrator tasks, for example. Nano is also how my son wrote his first program in C. It’s an acceptable experience that does not require past experience editing files.