Google Related Links

It took them longer than I expected, but it looks like Google has finally come out with a Related Links feature that people can add automatically-updated links to related searches, news or web pages on to their sites. Think Google Adwords only with search results instead of pay-for-placement advertisements. The text-box is simple to add to any webpage (it took me all of 30 seconds) and gets updated to whatever info is current when the page is viewed — essentially adding dynamic related content even if your page remains static.

I’m pleased to see this concept finally becoming mainstream, especially since the Margin Notes system I developed back in 1999 was a pioneer in the area. One issue that was tricky to figure out is what scope to use as the automatic search term. You can see the problem if you’re viewing this post in an aggregator like LiveJournal or even on my own main page — the search results are probably related at least in part to other posts on the page and not just this one. It’s a hard problem in the general case, but they should be able to get it to work most of the time. (I’ve not yet looked at their code to see what they’re actually doing — more after I check it out.) [Update 12:45pm: actually, LiveJournal strips out all Javascript so it won’t show up anyway… click here to see the actual post if you’re in LJ and have no clue what I’m talking about.]

One thing I didn’t have to worry about with Margin Notes was how to keep the system from being gamed by spammers and Google-juice stealers, though I did have to worry about relevance to individual readers. Something I’d like to see is a similar system that uses my own RSS subscriptions as the core source of info, plus perhaps one level of linkage out (e.g. take my blog-roll & RSS subscriptions plus the blog-roll and RSS subscriptions associated with each of those sites). That would give me some amount of personalization as well as make it harder to game the system.

(via Google Blogoscoped)

Update 12:30pm: I took a quick peek at the JavaScript, and as you probably could guess all it does is send a specially formatted request to http://www.googlesyndication.com/relcontent/content?… that includes the page’s URL. Sure enough, one second after I loaded the page in my browser I saw another retrieval from Google. Makes sense — you don’t want to have to deal with parsing the source document in the JavaScript itself — though it does mean the feature probably won’t work at all on pages that are behind a firewall. (That’s probably all for the best, as otherwise it’d be all too easy to slip up and start broadcasting supposedly-secure information out to Google.)

That said, I’m surprised at how lousy the results are. It looks like they’re relying on their cached copy when available, which for a blog post is, of course, almost guaranteed to not be related to the current post. As for the post-specific page, I’m getting lots of related links about blogs in general, which makes me suspect they’re doing a bad job of distinguishing actual page content from my page’s window dressing, Javascript and navigation bars. That really surprises me (if in fact it’s where the trouble lies) since that’s a problem they clearly know how to solve for their normal indexing.