Month: April 2008

Microsoft builds tool to steal data off computers

From the “what could possibly go wrong” department, Microsoft just announced that they’ve developed a simple one-button tool to break into a computer and suck down an entire hard drive’s contents onto a thumb drive:

COFEE, a preconfigured, automated tool fits on a USB thumb drive. Prior to COFEE the equivalent work would require a computer forensics expert to enter 150 complex commands manually through a process that could take three to four hours. With COFEE, you simply plug into a running computer to extract the data with the click of one button –completing the work in about 20 minutes.

It’s basically a whole bunch of existing password guessers and other cracking software into a single one-touch device — and since it works on the live computer it can bypass encrypted disks like Vista’s BitLocker so long as the user is still logged in.

Apparently Microsoft isn’t concerned that they’re building tools that can turn any two-bit felon into a highly-skilled data thief, or that they’re providing products that exploit their very own security holes. After all, they’re only supplying these devices to law enforcement — so what could possibly go wrong?

Calculating the Birthday Paradox

For the last couple years I’ve been working on a program that generates a large number of essentially random ID strings (it’s actually a replicated document storage system that uses the hash of a file’s content as its ID, but the details don’t matter). Since IDs are independently generated there will always be some chance that two different files will just happen to have the same ID assigned — so how long do I need to make my ID string before that probability is small enough that I can sleep at night?

This is essentially the Birthday Paradox, just with bigger numbers and in a different form. For those who haven’t heard of it, the canonical form of the Birthday Paradox asks what the probability is that, out of a random group of 23 people, at least two in share the same birthday. (The “paradoxical” part is that the answer is just over 50%, much higher than most people’s intuition would suggest.) My question just turns that around and asks “how many random N-bit IDs have to be generated before there is a one in a million chance of any two of them being identical?”

Rejiggering the formulas given in Wikipedia, here’s the approximation:

n ≅ (-2 · S · ln(1 – P))1/2

where:

  • n is the number of entities required to reach the given probability
  • P is the probability desired
  • S is the size of the set of all possible entities

For example, the number of people you would need for a 50% chance that at least two of them have the same birthday is (-2 · 365 · ln(1/2))1/2, or between 22 and 23 people. As a more practical example, you would only need to generate 77,163 PGP keys before having a 50% chance of a collision between their 8-character short-form fingerprints.

As for my one-in-a-million chance, you’d need to randomly generate roughly 2(N – 19)/2 N-bit strings before having a one-in-a-million chance of a collision, which means I would need to randomly generate around 270 of my 160-bit ID strings before there would be a one-in-a-million chance of having a collision. I think I can sleep at night.