December 2005

Annotated blog corpus to be released at WWE 2006

Intelliseek will be a big corpus of spidered and annotated blog posts to attendees at the 3rd Annual Workshop on the Weblogging Ecosystem (held in conjunction with the WWW 2006 Conference in Edinburgh, Scottland):

The data release comprises a complete set of weblog posts for three weeks in July 2005 (on the order of 10M posts from 1M weblogs). This data set has been selected as it spans a period of time during which an event of global significance occurred, namely the London bombings.

The data set includes the full content of the posts plus mark-up. The marked-up fields include: date of posting, time of posting, author name, title of the post, weblog url, permalink, tags/categories, and outlinks classified by type – details may be found here.

Sounds like a great resource for researchers. I’m also amused (in a dark sort of way) by the datashare individual agreement they require people to sign — essentially they admit that there’s no way they can get copyright clearance from all million or so bloggers they’ve collected, so they just ask everyone to agree to remove any posts if anyone complains, not use the results for commercial purposes and not use it passed the workshop.

Annotated blog corpus to be released at WWE 2006 Read More »

Big Brother Down Under

From the Sydney Morning Herald:

Jane, from Coogee, was surprised to find three police on her bus asking to inspect mobile phones. Each took a phone at random and scrolled through messages for five or ten minutes. Everyone obeyed. “The people were perfectly friendly about it,” she said. “I thought it was a bit weird and a breach of privacy. But I didn’t say anything. Nobody did.”

No, it’s not about terrorism, it’s about potential racial violence, but it’s still that nasty abuse-of-rights-in-the-name-of-safety-from-unknown-boogeymen vibe. Of course, such flagrant violations of our rights without a court order could never happen in the US. In the US, we’d never even know they’d read our text messages without a court order until we read about it in the New York Times.

(Thanks to Omri for the link.)

Big Brother Down Under Read More »

Bowing to pressure, retailers agree to take Lord’s name in vain

Docbug Exclusive — Faced with a potential boycott from right-wing Christian groups, retailers Target and Lowes have agreed to reinstate their long-standing policy of using Christ’s name for cheap commercial gain. The companies were targeted by the American Family Association because they refer to the word “holiday” instead of Christmas in their advertisements and storewide decorations.

Conservative pundits were quick to call the move a victory for those who recognize Christ as an inherent part of the end-of-year buying season. Spokesmen for both companies say they intended no disrespect, and that they plan to institute policies to insure that religion will be more prominently exploited in the future.

(Update 12/15/05: fixed typo)

Bowing to pressure, retailers agree to take Lord’s name in vain Read More »

No need to outrun the bear…

From Bruce Schneier’s Crypto-gram:

Two years ago, if someone asked me about protecting against identity theft, I would tell them to shred their trash and be careful giving information over the Internet. Today, that advice is obsolete. Criminals are not stealing identity information in ones and twos; they’re stealing identity information in blocks of hundreds of thousands and even millions.

On the plus side, he says so many identities are being stolen now that the thieves don’t have time to use more than a small percentage of them for fraud…

No need to outrun the bear… Read More »

GMail “Web Clips” are still context blind…

GMail has added Web Clips at the top of their page, showing RSS and Atom feeds plus “relevant sponsored links” to the top of your messages. Unfortunately, it looks like only the sponsored links are actually relevant (which I read to mean “related to the message you’re reading”). Clips from your own RSS feeds are still just random.

Hopefully they’re busy working on fixing that — I still think automatic annotation of email (and blog entries) with other related entries form a largish set of favorite RSS feeds is a seriously useful application that needs to be exploited. Honestly, I’ve been expecting it to be just around the corner for about three years now, and I’m not sure why I’m still waiting. (I know, I know… if I really want it done I’d sit down and write one myself…)

GMail “Web Clips” are still context blind… Read More »

Let’s hear it for flexibility…

Two interesting technologies have just been announced in the flexible-computing arena. First (via engadget) is NEC’s announcement of their Organic Radical Battery, a 300-micron thick flexible battery with an energy density of about 1 mWh/cm2 and recharge time of just 30 seconds. Then throw in Plastic Logic‘s announcement of a 10″ diagonal SVGA E-Ink display (4-level greyscale) that’s both flexible and less than 0.4mm thick. (Thanks to Kurt for the links!)

nec-orb-flexible-battery.jpg plastic-logic-eink-display.jpg
NEC’s ORB battery Plastic Logic’s E-Ink based display

Let’s hear it for flexibility… Read More »