<antirez>

antirez 1612 days ago. 151519 views.
Terah is a planet far away, where networks never split. They have a single issue with their computer networks, from time to time, single hosts break in a way or the other. Sometimes is a broken power supply, other times a crashed disk, or a software issue completely blocking the system.

The inhabitants of this strange planet use two database systems. One is imported from planet Earth via the Galactic Exchange Program, and is called EDB. The other is produced by engineers from Terah, and is called TDB. The databases are functionally equivalent, but they have different semantics when a network partition happens. While the database from Earth stops accepting writes as long as it is not connected with the majority of the other database nodes, the database from Terah works as long as the majority of the clients can reach at least a database node (incidentally, the author of this story released a similar software project called Sentinel, but this is just a coincidence).

Terah users have setups like the following, with three database nodes and three web servers running clients asking queries to the database nodes ("D" denotes a database node, "C" a client).

              D  D  D

              C  C  C

EDB is designed to avoid problems on partitions like this:

              D1 \ D  D
                 /
              C1 \ C  C

C1 writing to D1 may result into lost writes if D1 happened to be the master.

However in Terah net splits are not an issue, they invented a solution for all the network partitions back in Galactic Year 712! Still their technology is currently not able to avoid that single hosts fail.

There is a sysop from Terah, Daniel Fucbitz, that always complains about EDB. He does not understand why on the Earth… oops on the Terah I mean, his company keeps importing EDB, that causes a lot of troubles. He reasons like this: "If a single node of my network fails, I'm safe with both EDB and TDB, but what happens if one night I'm not lucky and two hosts will fail at the same time?".

Actually with EDB if two nodes out of the six nodes will fail during the same night, and these nodes happen to be two "D" nodes, the system will stop working. The probability for this to happen is (3/6)*(2/5), that is... 20%!

On the other hand TDB will never stop working as long as only two nodes will fail.

And what about three nodes failing at the same time? With EDB this will bring the system down with a probability of 50% (two "D" nodes down) + 5% (all clients down), for a total probability of 55%.

While TDB would stop working with a probability of just 5% (all the three DB nodes down), plus 15% (master plus two clients down, no promotion possible), plus 5% (all clients down), for a total of 25%.

Daniel Fucbitz sometimes watches outside his office window, waiting for the third sun to raise, thinking that, yes, on planet Earth is nice to resist to partitions, but it really is not for free at all.
blog comments powered by Disqus
: