Some math about the Engineyard contest

Friday, 17 July 09
I did some math, and I think this is the probability of two random SHA1 strings having the hamming dinstance of 'D':
(1/2^160)*(160!/((160-d)!*d!))
This is why. For example imagine we want to compute the probability of an hamming distance of 2. I want the first bit to be not equal, that occurs with a probability of .5, the second bit to be not equal too, again .5, the others to be equal, that is .5 each. This are independent events, so we need to multiply them together, basically for any given pattern of equal/not equal we want there is a probability of .5^160 for a specific arrangement of equal/not-equal of the bits.

But... with an hamming distance of 2 it is required that 2 bits are not equal and 158 are equal, in any kind of order. How many ways there are to arrange 160 items, two black and 158 white? 160!/(158!*2!), so I actually have to multiply the probability of a single arrangement for the number of possible arrangements.

Ok, what are the practical implications of this? That with an optimized C program you can compute around 3.25 million sha1 hashes per second per core (this is what I got with my implementation). So if you have a server farm with 250 computers that happen to be quad cores, you need the following amount of hours to get a given hamming distance:
6 hours for an h.d. of 33
26 hours for an h.d. of 32
108 hours for an h.d. of 31
it looks like that university clusters can bring us a solution with an HD of 32 or 31 with a bit of luck.

People having just ten quadcore boxes can expect to reach an HD of 35 in 12 hours.

I could love to know if my math is correct, I'm not particularly strong on probability theory... but it looks correct to me.
post read 8181 times* (average 21.6 visits/day)
Posted at 14:54:37 permalink | 9 comments | print | post it | View blog reactions

Comments

notaddicted writes:
17 Jul 09, 15:54:32
Assume the xor of your test hash with their desired hash is just 160 random bits. Furthermore, choose a certain number N, and assert: they must be white!! What is the probability that this is true: (0.5 for each bit)^(number of bits you chose).

So given 160 random bits, what are the chances of there being atlead N bits set to zero? (0.5^N)*(ways to choose N from 160).

(0.5^N)*(160!/(N!*(160-N)!)).

define d = 160 - N

0.5^(160-d)*(160!/((160-d)!*d!))

So I think that you are miscalculating by a bit, that is why & how.
chupacabra writes:
17 Jul 09, 17:12:15
Huh?
Leah Silber writes:
17 Jul 09, 18:07:45
Looks like you're off to a good start :) Good luck, and thanks for participating!
antirez writes:
17 Jul 09, 18:43:04
@notaddicted: I'm not sure we are calculating the same thing, my goal was to compute the probability of two random 160 bit strings to have an hamming distance of *exactly* N.
Hatem Nassrat writes:
17 Jul 09, 18:43:16
There are some shortcuts that can shorten that time:

http://eprint.iacr.org/2004/304
antirez writes:
17 Jul 09, 18:46:32
@Hatem: "2^106 work, rather than the previously expected 2^160 work". The attack is only theoretical (un?)fortunately.
antirez writes:
17 Jul 09, 18:46:43
@Leah: thanks
James Harton writes:
18 Jul 09, 02:00:42
My demo code has managed to hit a hamming distance of 35 in six different 12 hour runs. I'm finding it very hard to get any closer though.
Joel Klein writes:
23 Jul 09, 14:36:19
Could you say more about how you get 3.25 million hashes/second per core? I've only gotten up to about half that, using Polar SSL's SHA1 code. My questions would be: what hash code did you use (link?), what hardware you're running on, and whether you had any particular optimizations particular to the contest situation.
Send a comment

name
your website(optional)
comment body
Write limone here if you are human -> citrix antispam citrix antispam