Look hard enough, and you’ll always find two identical fingerprints

Today’s LATimes reports that Brandon Mayfield just won his $2 million lawsuit against the FBI for his wrongful detention in 2004. Brandon is the Oregon lawyer who the FBI pinched in connection to the 2004 Madrid train bombings because a partial fingerprint found in Madrid was a “close enough” match to his own. One quote from the article:

Michael Cherry, president of Cherry Biometrics, an identification-technology company, said misidentification problems could grow worse as the U.S. and other governments add more fingerprints to their databases.

The problem is emphasized in the March report from the Office of the Inspector General on the case, which reads much like a Risks Digest post and has a lot of take-home lessons. The initial problem was that the FBI threw an extremely wide net by running the fingerprints found in Madrid through the Integrated Automated Fingerprint Identification System (IAFIS), a database that contains the fingerprints of more than 47 million people who have either been arrested or submitted fingerprints for background checks. With so many people in the database the system always spits out a number of (innocent) near-matches, so the FBI then goes over the results. The trouble is that in this case (a) Mayfield’s fingerprints were especially close, and (b) the FBI examiner got stuck in a pattern of circular reasoning, where once he found many points of similarity between the prints he began to “find” additional features that weren’t really in the lifted print but were suggested by features in Mayfield’s own prints.

People tend to forget that even extremely rare events are almost guaranteed to happen if you check often enough. For example, even if there was only a one in a billion chance of an innocent person being an extremely close match for a given fingerprint, that leaves about a 5% chance for each fingerprint checked of getting such a false positive. If we were to double the size of the database, that would rise to almost 10%. This kind of problem is inevitable when looking for extremely rare events, and applies even more broadly to fuzzy-matching systems like the TSA’s no-fly list and Total Information Awareness (in all its newly renamed forms), which try to identify terrorists from their credit card purchases, where they’ve traveled or how they spell their name.