Month: December 2006

Swivel

Swivel looks like it might be interesting. They’re billing their service as “YouTube for Data,” where you can upload your data sets and then graph or compare them to other sets. In its best form I can imagine something like this supporting open source style research, especially if they support ways to explain and present your data (that or a good API for bloggers to link in data). In its worst form I could see any sensible analysis of the data sets getting burried under a pile of meaningless correlation statistics.

Description via TechCrunch (via Datamining Blog):

Swivel Co-founders Dmitry Dimov and Brian Mulloy start off by describing their company as “YouTube for Data.” That’s a good start for someone trying to understand it, because the site allows users to upload data – any data – and display it to other users visually. The number of page views your website generates. Or a stock price over time. Weather data. Commodity prices. The number of Bald Eagles in Washington state. Whatever. Uploaded data can be rated, commented and bookmared by other users, helping to sort the interesting (and accurate) wheat from the chaff. And graphs of data can be embedded into websites. So it is in fact a bit like a YouTube for Data.

But then the real fun begins. You and other users can then compare that data to other data sets to find possible correlation (or lack thereof). Compare gas prices to presidential approval ratings or UFO sightings to iPod sales. Track your page views against weather reports in Silicon Valley. See if something interesting occurs.

Earliest sunset of the year

A bit of trivia: even though the Winter Solstice isn’t for another couple weeks, tomorrow will be the earliest sunset of the year (about 4:55 PM in San Francisco). That’s because even though the days will keep getting shorter until December 22nd, sunrise will be getting later even faster.

(Calculated over at Express Tech’s Sunrise and Sunset Calculator, which is only one I could find that includes seconds.)

Getting attention for your research

Seth Finkelstein over at Infothought comments on the media attention being given to Psyphon:

I’m all for this project, but the activism lesson I draw from its prominent coverage is NOT necessarily a happy one. There’s been activists working on this sort of stuff for years and years. The critical variable here is not technology, since those reporters wouldn’t be able to tell a Tor from a FreeNet. What matters is *ATTENTION*. The backing from the various organizational sponsors is the reason for the widespread publicity.

Seth beats this drum pretty regularly (usually with lament) but echos what Bill Buxton phrased as a battlecry at CSCW, namely that making an impact in the world isn’t about having brand new ideas, it’s about understanding which ideas are ripe for exploitation and then having the ability to marshal the right resources to get them into the world. Buxton feels that the research community in general isn’t putting enough effort into that last bit, and believes in the overall philosophy so much that he’s essentially become a full-time evangelist and public speaker rather than doing his own research.

Look hard enough, and you’ll always find two identical fingerprints

Today’s LATimes reports that Brandon Mayfield just won his $2 million lawsuit against the FBI for his wrongful detention in 2004. Brandon is the Oregon lawyer who the FBI pinched in connection to the 2004 Madrid train bombings because a partial fingerprint found in Madrid was a “close enough” match to his own. One quote from the article:

Michael Cherry, president of Cherry Biometrics, an identification-technology company, said misidentification problems could grow worse as the U.S. and other governments add more fingerprints to their databases.

The problem is emphasized in the March report from the Office of the Inspector General on the case, which reads much like a Risks Digest post and has a lot of take-home lessons. The initial problem was that the FBI threw an extremely wide net by running the fingerprints found in Madrid through the Integrated Automated Fingerprint Identification System (IAFIS), a database that contains the fingerprints of more than 47 million people who have either been arrested or submitted fingerprints for background checks. With so many people in the database the system always spits out a number of (innocent) near-matches, so the FBI then goes over the results. The trouble is that in this case (a) Mayfield’s fingerprints were especially close, and (b) the FBI examiner got stuck in a pattern of circular reasoning, where once he found many points of similarity between the prints he began to “find” additional features that weren’t really in the lifted print but were suggested by features in Mayfield’s own prints.

People tend to forget that even extremely rare events are almost guaranteed to happen if you check often enough. For example, even if there was only a one in a billion chance of an innocent person being an extremely close match for a given fingerprint, that leaves about a 5% chance for each fingerprint checked of getting such a false positive. If we were to double the size of the database, that would rise to almost 10%. This kind of problem is inevitable when looking for extremely rare events, and applies even more broadly to fuzzy-matching systems like the TSA’s no-fly list and Total Information Awareness (in all its newly renamed forms), which try to identify terrorists from their credit card purchases, where they’ve traveled or how they spell their name.