Some math about the Engineyard contest
Friday, 17 July 09
I did some math, and I think this is the probability of two random SHA1 strings having the hamming dinstance of 'D':
But... with an hamming distance of 2 it is required that 2 bits are not equal and 158 are equal, in any kind of order. How many ways there are to arrange 160 items, two black and 158 white? 160!/(158!*2!), so I actually have to multiply the probability of a single arrangement for the number of possible arrangements.
Ok, what are the practical implications of this? That with an optimized C program you can compute around 3.25 million sha1 hashes per second per core (this is what I got with my implementation). So if you have a server farm with 250 computers that happen to be quad cores, you need the following amount of hours to get a given hamming distance:
People having just ten quadcore boxes can expect to reach an HD of 35 in 12 hours.
I could love to know if my math is correct, I'm not particularly strong on probability theory... but it looks correct to me.
(1/2^160)*(160!/((160-d)!*d!))This is why. For example imagine we want to compute the probability of an hamming distance of 2. I want the first bit to be not equal, that occurs with a probability of .5, the second bit to be not equal too, again .5, the others to be equal, that is .5 each. This are independent events, so we need to multiply them together, basically for any given pattern of equal/not equal we want there is a probability of .5^160 for a specific arrangement of equal/not-equal of the bits.
But... with an hamming distance of 2 it is required that 2 bits are not equal and 158 are equal, in any kind of order. How many ways there are to arrange 160 items, two black and 158 white? 160!/(158!*2!), so I actually have to multiply the probability of a single arrangement for the number of possible arrangements.
Ok, what are the practical implications of this? That with an optimized C program you can compute around 3.25 million sha1 hashes per second per core (this is what I got with my implementation). So if you have a server farm with 250 computers that happen to be quad cores, you need the following amount of hours to get a given hamming distance:
6 hours for an h.d. of 33 26 hours for an h.d. of 32 108 hours for an h.d. of 31it looks like that university clusters can bring us a solution with an HD of 32 or 31 with a bit of luck.
People having just ten quadcore boxes can expect to reach an HD of 35 in 12 hours.
I could love to know if my math is correct, I'm not particularly strong on probability theory... but it looks correct to me.
post read 8181 times* (average 21.6 visits/day)
Posted at 14:54:37 permalink | 9 comments | print | post it | View blog reactionsComments
17 Jul 09, 18:07:45
Looks like you're off to a good start :) Good luck, and thanks for participating!
17 Jul 09, 18:43:04
@notaddicted: I'm not sure we are calculating the same thing, my goal was to compute the probability of two random 160 bit strings to have an hamming distance of *exactly* N.
17 Jul 09, 18:43:16
17 Jul 09, 18:46:32
@Hatem: "2^106 work, rather than the previously expected 2^160 work". The attack is only theoretical (un?)fortunately.
18 Jul 09, 02:00:42
My demo code has managed to hit a hamming distance of 35 in six different 12 hour runs. I'm finding it very hard to get any closer though.
23 Jul 09, 14:36:19
Could you say more about how you get 3.25 million hashes/second per core? I've only gotten up to about half that, using Polar SSL's SHA1 code. My questions would be: what hash code did you use (link?), what hardware you're running on, and whether you had any particular optimizations particular to the contest situation.

So given 160 random bits, what are the chances of there being atlead N bits set to zero? (0.5^N)*(ways to choose N from 160).
(0.5^N)*(160!/(N!*(160-N)!)).
define d = 160 - N
0.5^(160-d)*(160!/((160-d)!*d!))
So I think that you are miscalculating by a bit, that is why & how.