Redis as an LRU cache

Friday, 15 October 10

I see Redis definitely more as a flexible tool that as a solution specialized to solve a specific problem: his mixed soul of cache, store, and messaging server shows this very well.

Redis as a cache used to work in two main ways: you could either set a time to live to cached entries. If you tune the TTL well enough, and you know how many new objects are created every second, you can avoid Redis using more than a given amount of RAM.

Another way to use Redis as a cache is the maxmemory directive, a feature that allows specifying a maximum amount of memory to use. When new data is added to the server, and the memory limit was already reached, the server will remove some old data deleting a volatile key, that is, a key with an EXPIRE (a timeout) set, even if the key is still far from expiring automatically.

The algorithm used is very simple, three random volatile keys are sampled, the key with the nearest expire time is removed from the dataset. If there are not volatile keys at all the server returns an error, as a write operation was requested, we are already at the memory limit specified by the user, and no volatile keys can be removed to make room for new data.

Redis as an LRU cache

The above solutions are far from being perfect. This are the main problems:

With the first approach of setting a relatively short expire in order to balance the creation and destruction of objects to avoid using too much memory, there is always the risk of using too little or too much memory. Why deleting objects when there is no need to? And why using more memory than needed when the creation rate of objects is over the expected value?
maxmemory was definitely better, but the purging algorithm of removing the key with the nearest TTL was not optimal with many applications where an LRU algorithm can work better. Most of the time you end setting the same TTL to all the keys, turing the algorithm into actually random eviction. Fortunately random eviction is not that bad (and with some access pattern it is actually better than LRU) but there was definitely room to improve things.
With both the solutions, setting an expire in every key is needed, and this costs memory in Redis. If we want to use Redis as an LRU cache, and in general as a cache, every data inside the instance is a good candidate for eviction, regardless of an expire set.

In Redis master (what will be named Redis 2.2 when will reach stability) there was already an LRU field in every object used for Virtual Memory. Why not using it for the cache mode as well? It was also the right time to fix the other problems mentioned here, so yesterday I started writing some code that is now already merged in the master branch at github.

What's new

So the new version of maxmemory is actually composed of three different configuration directives:

maxmemory bytes

This works as usually, specifying the max number of bytes to use. You can specify this as kbytes, gigabytes and so forth as usually, like maxmemory 2g.

maxmemory-policy policy

This new configuration option is used to specify the algorithm (policy) to use when we need to reclaim memory. There are five different algorithms now:

volatile-lru remove a key among the ones with an expire set, trying to remove keys not recently used.
volatile-ttl remove a key among the ones with an expire set, trying to remove keys with short remaining time to live.
volatile-random remove a random key among the ones with an expire set.
allkeys-lru like volatile-lru, but will remove every kind of key, both normal keys or keys with an expire set.
allkeys-random like volatile-random, but will remove every kind of keys, both normal keys and keys with an expire set.

maxmemory-samples number_of_samples

The last config option is used to tune the algorithms precision. In order to save memory Redis just adds a 22 bits field to every object for LRU. When we need to remove a key we sample N keys, and remove the one that was idle for longer time. For default three keys are sampled, that is a reasonable approximation of LRU in the long run, but you can get more precision at the cost of some more CPU time changing the number of keys to sample.

Currently I'm still testing this code carefully, but for sure even if it's little code, it's a major step forward in making Redis a valuable caching solution.

Important note: all the new features are compatible with the CONFIG command, so you can set and get this parameters at run time using CONFIG SET and CONFIG GET.

Appendix: how to remember the Redis port number

Today on Twitter I saw a tweet related to the ability to remember the Redis port number. There is a trick, the Redis port number, 6379, is MERZ at the phone keyboard.

Is it a coincidence that it sounds not random enough? Actually not ;) I selected 6379 because of MERZ, and not the other way around.

Everything started with Alessia Merz, an Italian Showgirl (make sure to check some (not safe for work) photo as well).

I and my friends are used to create our own slang, that is evolving since... 20 or 25 years. Well one adjective that we use consistently since 10 years is "merz", but the meaning of the word changed so much in the course of the time.

Initially it started because we were really delighted by the stupidity of the sentences that the showgirl was able to state in the italian TV. So we started using "MERZ" when something was... stupid. "Hey, that's merz!". And so forth. But then with some time the meaning shifted in something stupid as pointless, but with very technical value, or with an impressive amount of skills and patience and work involved, but still... stupid.

For instance creating a 3D map of your hometown by sampling the points with a GPS and a broken car going around for the whole night, or analyzing tons of lottery data searching for biases, perfectly knowing that we'll never spend a single penny in a lottery ticket anyway, and so forth. "Merz" basically means... hack value, but is also referred to people not just things, people that act in a funny way just for hack value, or to be fun, and so forth.

So when I had to pick a port number for Redis I had no troubles, whatever number MERZ was at the phone, it was the Redis port number.

210773 views^*

Posted at 12:34:34 | permalink | 7 comments | print

Do you like this article?
Subscribe to the RSS feed of this blog or use the newsletter service in order to receive a notification every time there is something of new to read here.

Note: you'll not see this box again if you are a usual reader.

Comments

teleo writes:

15 Oct 10, 13:45:13

Hi Salvatore,

Is the default eviction policy volatile-ttl?

antirez writes:

15 Oct 10, 13:46:55

@teleo: no, voletile-ttl is indeed the old behavior, but the new default is instead volatile-lru that is I guess what most people will want to start.

obs writes:

15 Oct 10, 14:14:58

I like the allkeys-lru option :)

Brian Bulkowski writes:

15 Oct 10, 18:13:26

Citrusleaf, the clustered key-value database, has a very similar expiration feature, but it has two thresholds. The first threshold is when to activate the 'eviction' feature, which starts removing the oldest volatile keys, and the second stops writes if the maximum amount of memory is reached. Having a mark-and-sweep style of eviction gives much more fidelity than "sample three", and having stop-writes gives protection in all use scenarios against run-away writes beyond the capacity of memory.

Check out the wiki:
http://citrusleaf.net/moin/server/moin.cgi

antirez writes:

15 Oct 10, 18:59:08

@Brian: the whole point of using a "sample three" algorithm is to save memory. I think this is much more valuable than precision, especially since this randomized algorithms are rarely well understood. An example: sampling with just three objects will expire 666 objects out of a dataset of 999 with an error rate of only 14% compared to the *perfect* LRU algorithm.

And in the 14% of the remaining there are hardly elements that are in the range of very used elements. So the memory gain will pay for the precision without doubts.

Not sure why you use two thresholds, I think that using one is better as you fix the amount of memory Redis should use, and it will respect your willing deleting old data as you put more new data. Memcached-style.

Kore writes:

16 Oct 10, 01:45:51

The allkeys-lru policy sounds like it could be quite handy in my deployment. I've been hoping for the ability to update expiry times to simulate LRU behaviour, but this is even better.

Appreciate all your work Salvatore, keep it up!

Newbie writes:

01 Dec 10, 22:13:22

I'd like to set TTL on keys so that they stay in the cache no longer than X amount of time. Then I'd like to set the policy to volitile-LRU.

Am I correct in believing that even if a key has been very recently used when its TTL is up, it will be purged? Even if the cache is nowhere near maxbytes?

This way, stale data always gets purged and otherwise older data gets purged if we run out of memory.

comments closed

antirez weblog