Redis weekly update #1 - Hashes and... many more!

Friday, 12 March 10

This is the first article of a serie called, with little imagination, Redis Weekly Updates. The idea is to provide some insight in the Redis development process: what's new in Redis master this week? What are the new directions? The people involved in the development process?

I'll try keeping this articles as informative as possible and, when it is possible, not too long, or at least split in paragraphs for easy scanning, so that this can become a weekly appointment for everybody following the Redis development. So... let's start!

Hashes

One of the biggest changes of the last week is without dubt the support for the hash data type. Redis already supported four data types: Strings, Lists, Sets, and Sorted Sets. Hashes were missing, so there was not a simple way to have a key holding an object composed of different fields. The only two alternatives were:

Using one key for every field of an object. Like obj:1:name, obj:2:surname.
Using some form of encoding, like JSON, storing the encoded object in a single key.

Both are suboptimal, as the first consumes too much memory, the latter does not allow accessing single fields. To set a field is a not atomic operation it is required to GET the old value, decode the object, modify the field, encode again, SET back the value. Not cool.

Hahes are solving this problem in the right way: they are even less memory intensive of JSON encoded objects. Fields can be accessed independently. And I plan to support a few atomic operations against fields, for instance INCRBY.

Even if Hashes internals are almost finished, the command set is currently limited: only two commands are supported:

HSET key field value
HGET key field

There is no HDEL, nor ways to retrieve the full list of fields, values, or both. But it's a matter of days, I'm working on it.

An interesting implementation detail of hashes (and the root of the interesting property of being so memory efficient) is that they use a compact representation of string => string maps for hashes with less than N field (configurable) where every field is not bigger than a given amount of bytes (configurable). I called this data simple structure zipmap (Zipped Map), and it is an O(N) compact data structure (nothing is free in computer science, but for few fields, and given the little constant times, this is not a problem at all). Once this limits are reached, the hash is automatically converted into a real hash table. This way we can support both small hashes (objects encoded as hashes) in a space efficient way and bigger hashes (using hashes as sub-dictionaries) in a time efficient way.

How much space can hashes save? Basically for 5 fields objects storing them as hashes will take just a bit more than 20% of the memory required to store every field as a separated key. So it's a pretty big win. The more fields, the more memory they'll save.

Redis cli interactive mode and DISCARD in MULTI/EXEC

Thanks to Michel Martens and Damian Janowski Redis-cli now supports interactive mode (that is now the default mode). That is, a REPL interface where you can type one command after the other in a prompt, with a persistent connection (so that it is possible to test things like MULTI/EXEC).

Also Damian implemented DISCARD. So now Redis transactions can be "aborted". This new command has effects in the field of higher level interfaces, for instance it's now possible to support cool things like aborting a transaction implemented as a Ruby block if there is an exception. Probably this change will be merged in Ezra Zygmuntowicz's Redis ruby library (the most used Redis library probably, from a great coder and long term contributor to Redis as well, but if you are reading this you already know about Ezra I bet!).

Just in case you are not aware of it, this two guys are also the authors of Ohm Object-hash mapping :)

Sorted Sets advanced operations

Pieter Noordhuis was able to understand the Redis code base, modify the Redis skip list implementation into an augmented one able to support things like ZRANK (given an element, find the position in the sorted set in O(log(N))), and build things like ZREMRANGEBYINDEX, that is a more powerful LTRIM alike command for sorted sets, in just a couple of days, showing how much a talented programmer he is.

If this is not enough, check the work he did lately: sorted set union and intersection, with different ways to combine the score in the resulting sorted set, (like sum, average, min, max). This is already superb, but it's not just a matter of coding skills. Instead to send pull requests Pieter spent a lot of time on the Redis IRC channel designing this commands together with me, so that we evaluated different solutions for different problems. In the process Pieter showed to be also a great designer. It was a great collaboration. If Redis were a company, I would hire it in no time.

Try Redis: in your browser

Alex McHale simply amazed me. While chatting in the IRC channel he exposed the idea of helping Redis in some way, so I manifested my interest in having a simple way to try Redis in your browser, similar to the famous Try Ruby, that MongoDB guys also replicated with Try Mongo.

A few days later we had Try Redis working! With many touches of class, like history pressing the up arrow and the use of namespaces so that all the users are using the same Redis instance without noticing this fact at all :) (I think the namespace code is originally from another great coder, Chris Wanstrath from Github, aka defunkt, but Alex improved the support and Chris is merging the changes).

Try Redis had a lot of success just after its release :) And now is the first entry of the Redis documentation. Thanks Alex!

The replication bug

Well another very important issue of this week was a not easy to spot bug in the Redis replication code :)

The bug was found to Jeremy Zawodny from Craigslist (they are using a fairly large Redis farm!). Not only Jeremy found the bug and was extremely supportive while I was trying to fix the bug, he also finally fixed it :)

The story and the cause of this bug are interesting IMHO, so I'm going to briefly write what happened. Jeremy filed a bug report about replication failing: slaves were unable to parse the .rdb file received from the master.

My best guess was that this was related to a broken .rdb file generation, both I and Petern spent many hours trying to find a sorted set that would corrupt the generated .rdb file. What was really strange was that the file was corrupted in a bad way but still no crashes or alike in the masters. If it was a memory corruption bug, there were good chances of the master failing as well in random ways. Instead everything was fine.

Jeremy investigated more and finally found and fixed the bug: the .rdb was corrupting while it was transfered from master to slave. In fact at craigslist they run a number of slaves for every host sharing the same work dir, and this is the code Redis was using in order to generate the temp file used to save the dump:

snprintf(tmpfile,256,"temp-%d.%ld.rdb",(int)time(NULL),(long int)random());

This looks random enough, instead it's lame. For a few reasons:

In redis there is no seeding of the PRNG via srand() because we want deterministic behavior, otherwise bug fixing is hard.
it's not so unlike that two slaves will sync with masters at the same time, especially in big environments where there are scripts handing replication.
why I did't used the pid that is ways more unique? two processes can't have the same pid at the same time, so the right thing was to use pid+time, and as Jeremy suggested, to use O_EXCL so even if there is a collision we are safe. This is how the new code looks like:

    while(maxtries--) {
        snprintf(tmpfile,256,
            "temp-%d.%ld.rdb",(int)time(NULL),(long int)getpid());
        dfd = open(tmpfile,O_CREAT|O_WRONLY|O_EXCL,0644);
        if (dfd != -1) break;
        sleep(1);
    }

This should be much better ;) I'm going to release 1.2.6 today or in the week end at max, with this fix.

There is something good about open source

I'm so happy because of this stories, involving five people willing to share their work for free, to push their knowledge forward. This are a few of the best traits of humanity itself. There is really something good about open source, and I'm honored to collaborate with such great guys. Thanks.

43466 views^*

Posted at 10:16:58 | permalink | 14 comments | print

Do you like this article?
Subscribe to the RSS feed of this blog or use the newsletter service in order to receive a notification every time there is something of new to read here.

Note: you'll not see this box again if you are a usual reader.

Comments

lpgauth writes:

12 Mar 10, 15:08:13

Hash support ftw! I was using JSON, this is going to simplify my life :)

Marcus writes:

12 Mar 10, 15:20:54

I love the hash support! I was using PHP serialization, but hashes would be a lot better!

Nate writes:

13 Mar 10, 00:05:55

I would love to try this out for a project, but unfortunately Redis doesn't run on Windows now. I wish the server supported POSIX and Win32.

Mordy writes:

13 Mar 10, 22:08:23

I love the weekly update. Please continue with it.

litaocheng writes:

14 Mar 10, 00:10:59

great work!I love the weekly update!

dude writes:

14 Mar 10, 11:06:33

Oh man I quite literally laughed out loud at Nate's comment Re: Windows support. Ha! Made my morning.

Antonio Ognio writes:

15 Mar 10, 02:42:13

So far loving Redis and this weekly update. It's very informative and definitely a win for the project. Please keep doing it!

Michael Stillwell writes:

16 Mar 10, 07:43:46

You say that when a hash get big it is "automatically converted into a real hash table." Does this mean it becomes part of the "main" key/value store? i.e.

HSET key field value

gets converted into something like

SET key+field value

? Or does each hash have its own completely separate datastructure?

antirez writes:

16 Mar 10, 10:49:33

@Michael: this conversion is not exposed to the user at all. The Hash type is always perceived as a key holding an aggregate type composed of fields associated with values. The zipmap/hashtable thing is just the internal representation for this.

Michael Stillwell writes:

16 Mar 10, 11:41:04

Oh, I didn't think it would be exposed, I was wondering more about the implementation, and any performance implications.

If I know I'm going to end up with a hash of thousands of keys, is it much more efficient to manage this myself, or is it better to leave it to Redis?

i.e. is there a difference between

HSET foo field_1 xx
HSET foo field_2 xx
...
HSET foo field_9999 xx

and

SET foo_field_1 xx
SET foo_field_2 xx
...
SET foo_field_9999 xx

if there are very many items in the "inner" hash?

(My guess is that the main differences are that "whole of hash" operations such retrieving the list of keys, and deleting the hash are easier/possible with the HSET style, but that this might come at a small performance cost?)

Ajay writes:

16 Mar 10, 19:05:06

Just out of curiosity: why not use mkstemp() to generate the temporary filename?

adam23jones writes:

05 Apr 11, 09:08:37

Thank you very much for your tips. I will follow the steps given by you.

http://www.mailmascot.com/

mail mascot writes:

18 Apr 11, 07:54:05

I want to try this for a project, but unfortunately Redis does not work on Windows now. I want the server supported POSIX and Win32.

please visit the source : http://mailmascot.co/

adam lieus writes:

02 May 11, 02:48:40

This is huge info. This is really a Good site,Always I like visit this site due to its contain and useful information.

http://mailingsoftwares.blogspot.com/

comments closed

antirez weblog