Month: April 2006

Webaroo

Remember my continuing rant about how it’s time to just cache the entire Web and keep it local? A start-up named Webaroo has a similar idea. They’re offering free software (Windows only) that caches “webpacks” of pages that make up certain interest areas, and update those caches whenever you re-synch. Their current plan is the usual “pay for it all through advertising” model.

I’ve not tried it yet and don’t know how easy it is to personalize webpacks or how well they handle things like accessing pages that require sign-in, but it definitely looks like a good start. (And if they do the job well, I could easily see them winding up being purchased by one of the big players in search.)

(Thanks to Aileen for the link!)

Caltech cannon upgrades to better tech school

mit-caltech-canon-hack.jpg

Almost exactly 20 years ago, Students from Harvey Mudd College pulled one over on their rival Caltech by relocating a Spanish-American War cannon from Caltech’s Fleming House to their own campus. Now the cannon has a new home: MIT hackers posing as the Howe & Ser Moving Company have relocated the cannon to the Massachusetts Institute of Technology campus in Cambridge, MA. The cannon now also sports a giant gold-plated Brass Rat, the MIT class ring. A plaque dedicating the cannon notes that “In honor of its previous owners, the cannon points towards Pasadena, CA.”

Google Related Links

It took them longer than I expected, but it looks like Google has finally come out with a Related Links feature that people can add automatically-updated links to related searches, news or web pages on to their sites. Think Google Adwords only with search results instead of pay-for-placement advertisements. The text-box is simple to add to any webpage (it took me all of 30 seconds) and gets updated to whatever info is current when the page is viewed — essentially adding dynamic related content even if your page remains static.

I’m pleased to see this concept finally becoming mainstream, especially since the Margin Notes system I developed back in 1999 was a pioneer in the area. One issue that was tricky to figure out is what scope to use as the automatic search term. You can see the problem if you’re viewing this post in an aggregator like LiveJournal or even on my own main page — the search results are probably related at least in part to other posts on the page and not just this one. It’s a hard problem in the general case, but they should be able to get it to work most of the time. (I’ve not yet looked at their code to see what they’re actually doing — more after I check it out.) [Update 12:45pm: actually, LiveJournal strips out all Javascript so it won’t show up anyway… click here to see the actual post if you’re in LJ and have no clue what I’m talking about.]

One thing I didn’t have to worry about with Margin Notes was how to keep the system from being gamed by spammers and Google-juice stealers, though I did have to worry about relevance to individual readers. Something I’d like to see is a similar system that uses my own RSS subscriptions as the core source of info, plus perhaps one level of linkage out (e.g. take my blog-roll & RSS subscriptions plus the blog-roll and RSS subscriptions associated with each of those sites). That would give me some amount of personalization as well as make it harder to game the system.

(via Google Blogoscoped)

Update 12:30pm: I took a quick peek at the JavaScript, and as you probably could guess all it does is send a specially formatted request to http://www.googlesyndication.com/relcontent/content?… that includes the page’s URL. Sure enough, one second after I loaded the page in my browser I saw another retrieval from Google. Makes sense — you don’t want to have to deal with parsing the source document in the JavaScript itself — though it does mean the feature probably won’t work at all on pages that are behind a firewall. (That’s probably all for the best, as otherwise it’d be all too easy to slip up and start broadcasting supposedly-secure information out to Google.)

That said, I’m surprised at how lousy the results are. It looks like they’re relying on their cached copy when available, which for a blog post is, of course, almost guaranteed to not be related to the current post. As for the post-specific page, I’m getting lots of related links about blogs in general, which makes me suspect they’re doing a bad job of distinguishing actual page content from my page’s window dressing, Javascript and navigation bars. That really surprises me (if in fact it’s where the trouble lies) since that’s a problem they clearly know how to solve for their normal indexing.

Statement of fair use in documentary films

My brother is working on a documentary called Reality Made Over, about Fox’s plastic-surgery reality TV show “The Swan”. Of course, since his subject matter television there’re lots of questions about what he needs permission to use and what counts as fair use under copyright law. Talking to him about it reminded me of the recent Documentary Filmmakers’ Statement of Best Practices in Fair Use that was put out by several associations of video and filmmakers, in consultation with the Center for Social Media at American University.

From their introduction:

This Statement of Best Practices in Fair Use makes clear what documentary filmmakers currently regard as reasonable application of the copyright “fair use” doctrine. Fair use expresses the core value of free expression within copyright law. The statement clarifies this crucial legal doctrine, to help filmmakers use it with confidence. Fair use is shaped, in part, by the practice of the professional communities that employ it. The statement is informed both by experience and ethical principles. It also draws on analogy: documentary filmmakers should have the same kind of access to copyrighted materials that is enjoyed by cultural and historical critics who work in print media and by news broadcasters.