A few days ago I started my day with my Twitter feed full of articles saying something like: “75% of Redis servers infected by malware”. The obvious misquote referred to a research by Incapsula where they found that 75% of the Redis instances left open on the internet, without any protection, on a public IP address, are infected .  https://www.incapsula.com/blog/report-75-of-open-redis-servers-are-infected.html Many folks don’t need any clarification about all this, because if you have some grip on computer security and how Redis works, you can contextualize all this without much efforts. However I’m writing this blog post for two reasons. The obvious one is that it can help the press and other users that are not much into security and/or Redis to understand what’s going on. The second is that the exposed Redis instances are a case study about safe defaults that should be interesting for the security circles.
News posted by antirez
[This blog post is also experimentally available on Medium: https://medium.com/antirez/a-short-tale-of-a-read-overflow-b9210d339cff] When a long running process crashes, it is pretty uncool. More so if the process happens to take a lot of state in memory. This is why I love web programming frameworks that are able, without major performance overhead, to create a new interpreter and a new state for each page view, and deallocate every resource used at the end of the page generation. It is an inherently more reliable programming paradigm, where memory leaks, descriptor leaks, and even random crashes from time to time do not constitute a serious issue. However system software like Redis is at the other side of the spectrum, a side populated by things that should never crash.
I saw multiple users asking me what is happening with Streams, when they’ll be ready for production uses, and in general what’s the ETA and the plan of the feature. This post will attempt to clarify a bit what comes next. To start, in this moment Streams are my main priority: I want to finish this work that I believe is very useful in the Redis community and immediately start with the Redis Cluster improvements plans. Actually the work on Cluster has already started, with my colleague Fabio Nicotra that is porting redis-trib, the Cluster management tool, inside the old and good redis-cli. This step involves translating the code from Ruby to C. In the meantime, a few weeks ago I finished writing the Streams core, and I deleted the “streams” feature branch, merging everything into the “unstable” branch.
Four days ago a user posted a critical issue in the Redis Github repository. The problem was related to the new Redis 4.0 PSYNC2 replication protocol, and was very critical. PSYNC2 brings a number of good things to Redis replication, including the ability to resynchronize just exchanging the differences, and not the whole data set, after a failover, and even after a slave controlled restart. The problem was about this latter feature: with PSYNC2 the RDB file is augmented with replication information. After a slave is restarted, the replication metadata is loaded back, and the slave is able to perform a PSYNC attempt, trying to handshake with the master and receive the differences since the last disconnection.
Until a few months ago, for me streams were no more than an interesting and relatively straightforward concept in the context of messaging. After Kafka popularized the concept, I mostly investigated their usefulness in the case of Disque, a message queue that is now headed to be translated into a Redis 4.2 module. Later I decided that Disque was all about AP messaging, which is, fault tolerance and guarantees of delivery without much efforts from the client, so I decided that the concept of streams was not a good match in that case.
Today I read an interesting article about how the Wolfenstein 3D game implemented a fade effect using a Linear Feedback Shift Register. Every pixel of the screen is set red in a pseudo random way, till all the screen turns red (or other colors depending on the event happening in the game). The blog post describing the implementation is here and is a nice read: http://fabiensanglard.net/fizzlefade/index.php You may wonder why the original code used a LFSR or why I'm proposing a different approach, instead of the vanilla setPixel(rand(),rand()): doing this with a pseudo random generator, as noted in the blog post, is slow, but is also visually very unpleasant, since the more red pixels you have on the screen already, the less likely is that you hit a new yet-not-red pixel, so the final pixels take forever to turn red (I *bet* that many readers of this blog post tried it in the old times of the Spectum, C64, or later with QBASIC or GWBasic). In the final part of the blog post the author writes:
A 10x programmer is, in the mythology of programming, a programmer that can do ten times the work of another normal programmer, where for normal programmer we can imagine one good at doing its work, but without the magical abilities of the 10x programmer. Actually to better characterize the “normal programmer” it is better to say that it represents the one having the average programming output, among the programmers that are professionals in this discipline. The programming community is extremely polarized about the existence or not of such a beast: who says there is no such a thing as the 10x programmer, who says it actually does not just exist, but there are even 100x programmers if you know where to look for.
After 10 million of units sold, and practically an endless set of different applications and auxiliary devices, like sensors and displays, I think it’s deserved to say that the Raspberry Pi is not just a success, it also became one of the preferred platforms for programmers to experiment in the embedded space. Probably with things like the Pi zero, it is also becoming the platform in order to create hardware products, without incurring all the risks and costs of designing, building, and writing software for vertical devices.
It’s not yet stable but it’s soon to become, and comes with a long list of things that will make Redis more useful for we users: finally Redis 4.0 Release Candidate 1 is here, and is bold enough to call itself 4.0 instead of 3.4. For me semantic versioning is not a thing, what I like instead is try to communicate, using version numbers and jumps, what’s up with the new version, and in this specific case 4.0 means “this is the shit”. It’s just that Redis 4.0 has a lot of things that Redis should have had since ages, in a different world where one developer can, like Ken The Warrior, duplicate itself in ten copies and start to code. But it does not matter how hard I try to learn about new vim shortcuts, still the duplicate-me thing is not in my chords.
Redis is often used for caching, in a setup where a fixed maximum memory to use is specified. When new data arrives, we need to make space by removing old data. The efficiency of Redis as a cache is related to how good decisions it makes about what data to evict: deleting data that is going to be needed soon is a poor strategy, while deleting data that is unlikely to be requested again is a good one. In other terms every cache has an hits/misses ratio, which is, in qualitative terms, just the percentage of read queries that the cache is able to serve. Accesses to the keys of a cache are not distributed evenly among the data set in most workloads. Often a small percentage of keys get a very large percentage of all the accesses. Moreover the access pattern often changes over time, which means that as time passes certain keys that were very requested may no longer be accessed often, and conversely, keys that once were not popular may turn into the most accessed keys.