Theoretical limitation on search engines?

Lately I’ve noticed a rise in the number of Google search results that just lead to a bunch of ads plus some automatically-generated content copied from other web pages, rather than pages with the original content I’m looking for. This is the latest step in an ongoing arms race between the search engines (and their users) and so-called search engine optimization companies that try to funnel searchers through to their customer’s ad-laden sites rather than going direct to the site they want. The SEOs are essentially using Google’s own infrastructure against it, creating Google-hosted blogs, generated using content from (I’m guessing) the results of Google searches, all sprinkled with links to pages containing nothing but Google-supplied Ads.

Google’s trying to stop folks from gaming the system like this, but I expect there’s some kind of fundamental limit to what can be done to stop it. You could probably even describe it as a theorem:

For any automatically-indexed search engine of sufficient size, it is possible to construct a document that has a high page rank for a given query even though the constructed document adds no useful information beyond that which would have been returned without it.

A corollary would be:

The more complete a search engine is in terms of documents indexed, the lower the relevance of its search results will be in terms of the ratio of documents with original content vs. documents that simply copy information from other pages.

If this does, in fact, wind up being a fundamental theorem for search engines, I have a humble suggestion for what we should name it: Göögel’s Incompleteness Theorem.

Theoretical limitation on search engines? Read More »

Digital Storytelling Festival in SF, Oct 7-9

In a couple weeks is the Digital Storytelling Festival in San Francisco (October 7-9):

The Digital Storytelling Festival was founded in1995 as an annual gathering where professionals and enthusiasts who use technology to communicate and share stories gather to examine creative works and new concepts being used in areas of education, community building, business, personal and legacy storytelling, new media and entertainment.

The Digital Storytelling Festival is an intimate gathering that inspires its audience with new knowledge, ideas and a better understanding of how the traditional form of storytelling is changing through the use of technology.

The Festival aspires to promote and evolve the art and practice of Digital Storytelling and encourages community by the sharing of ideas and meaningful dialogue among all its participants.

Registration is $350 ($200 student). The event is sponsored by KQED Public Radio & TV and the KQED Digital Storytelling Initiative.

Digital Storytelling Festival in SF, Oct 7-9 Read More »

Thoughts on Kurzweil’s Law

I heard Ray Kurzweil speak last night at the Long Now seminar. A friend who also attended says it was essentially the exact same talk he’d heard him give five years ago (ironic considering how fast things are supposed to be changing nowadays), but this was my first time hearing him in person. I must say it’s rare where a talk makes me alternate between thinking“Well, that’s completely bogus!” and “OK, that makes sense…” so many times.

Where I think he’s got it right:

  • People are inherently bad at extrapolating exponential trends, and we are currently experiencing technological exponential growth. This is especially true in the information and communication technologies, namely information processing, sensing and pattern-recognition, and human-to-human communications.

  • Reading between the lines of his talk, information technologies are bootstrapping technologies: once you have them, they make inventing the next stage easier, faster and cheaper.

  • The combination of biotech, new biological sensors and the ability to simulate complex processes are going to seriously challenge how we currently think of ourselves as individuals and even what it means to be human.

Where I think he’s got it wrong:

  • As I mentioned a few days ago, I think some of his exponential curves are the result of our natural tendency to gloss over things that happened in the past and focus on recent developments. (A less generous assessment would say he just did it to make his curve work out, but this isn’t limited to Ray’s charts; in fact, he showed the same graph with points plotted from other lists of momentous inventions drawn from various encyclopedia.) This is not to say there aren’t several exponential growth curves in play at the moment, but I don’t think this is a trend that has been going on for hundreds of thousands of years.

  • It’s an old saw that people overestimate what will be possible in five years and underestimate what will be possible in 20 years. I think his predictions of ubiquitous augmented reality, computers distributed throughout one’s clothing, and head-up display contact lenses (or direct-to retina/optic nerve) will all happen at some point, but not in the next 5 years.

  • Ray talks about the creation of artificial intelligences as if some day in the near future we’ll invent HAL and start talking to it. Ever since Alan Turing described the Turing Test, people have described artificial intelligences in terms of ability to generate and understand language, ability to make human-like decisions, ability to show and understand emotion — in other words, the ability to relate to humans. I see no reason to think the first AIs will think or communicate like us at all, nor do I think they will exist at human scale.

    In fact, I would say several species of human-made hyper-intelligences already walk among us: we call them corporations, nation-states, philosophical or political movements, and civilizations. Their neurons are the people, documents and cognitive artifacts that make up the whole. Their synapses are the communication and social networks that run between these individuals. The specific structure of the intelligence is set by its laws, traditions and culture.

    The dual of the idea that groups of people, documents and cognitive artifacts can be a single intelligence is the idea that my own human intelligence, as an individual, is actually made up of more than just what I can think when I’m lying naked and alone. As Edwin Hutchins points out in Cognition in the Wild, human intelligence is not just the product of what’s inside our skull but stems from the combination of our brains, our culture, and tools such as the paper we write on and the skill of writing itself. I expect by the time a machine with no human in the loop has passed the Turing Test, the continuing augmentation of humans will have long-since forced us to recognize that the test wasn’t all that good a criterion for intelligence in the first place.

  • Even though our knowledge and our information technologies are improving exponentially in many fields, there are some parts of human knowledge that are not growing at this incredible rate. Notably, our understanding of existential questions about the purpose of life, what we as humans value, and the meaning of free will and have not kept apace with technology — even though in many cases new technology and new understandings about the world have pulled the rug out of our previous answers. These questions will become especially important as we start fundamentally modifying our biology and finally unravel the mysteries of the mind itself.

Thoughts on Kurzweil’s Law Read More »

The Singularity is near now?

Kevin Drum over at the Washington Monthly has a nice extrapolation based on Ray Kurzweil’s new book (see his chart for added effect):

With that said, however, it turns out that I do have a bone to pick with Kurzweil over one of the trend charts that litter his book. Basically, he argues that the pace of change has been accelerating over time, so that major inventions are being created ever faster as time goes by. 10,000 years ago it took several thousand years between major inventions (agriculture –> wheel), while a century ago it took only a few decades (telephone –> radio).

Fine. But his cleverly constructed chart cheats: it stops about 30 years ago. So I decided to extend it. My version of his chart extends to last month (see pink shaded area), and it indicates that major, paradigm-busting inventions should be spaced about a week apart these days.

So what gives? Seems to me that the Singularity should be right on our doorstep, not 40 years away. And while 40 years may not seem like all that much in the great scheme of things, it means a lot if you’re 46 years old. Which I am.

So what happened?

If I had to guess without having read the book yet, I’d say what the chart really shows is the gloss of history: the longer ago something was, the less important we take it to be and the more we lump it together with everything else from that period. For example, the last four entries on Ray’s chart are the Industrial Revolution, the Telephone, electricity, and radio (as one event), the computer and the personal computer (as two events). Why did he decide to label these as four paradigm-busting inventions rather than seven, or as one? Contrarily, why are writing and the wheel lumped into the same invention, or printing and the experimental method? Depending on what you call a single “event” the spacing between those events could show accelerating change, constant change, or stability punctuated by short periods of rapid change (the last one being my own personal belief).

Could the one true constant be the belief that our generation is experiencing more change than any other?

The Singularity is near now? Read More »

Rain falls, film at 11.

I love living in a place where a little rain makes front-page news:

Moisture and unstable air spinning off a tropical storm along the coast of Mexico brought a rare burst of thunder, lightning and rain — even some hail and power outages — to the Bay Area on Tuesday afternoon.

From 1 p.m. to about 3 p.m., thunder boomed as the brunt of the storm passed over San Jose, Fremont, Palo Alto and San Francisco, while sporadic rainfall wet roadways and cooled down the region.

It was the first rain since June 17, when 0.03 inches fell in San Jose. And the storm marked the first recorded rainfall on Sept. 20 in San Jose at least back to 1948, according to National Weather Service records.

Rain falls, film at 11. Read More »

Wikipedia the (physical) World

Semapedia is a project to annotate physical locations with 2D barcodes that link to Wikipedia articles. With the Semacode software running on your PDA/cellphone, you scan a barcode and it’ll take you to the linked-to article. There’ve been a lot of attempts at this sort of physical annotation of the world, WorldBoard being one of the earlier ones I remember.

semapedia-bug.gif

I like the concept in theory, but I’m always disappointed by the quality and variability of the links. Do I really want a link about privacy just because I see a no-tresspassing sign, or about the Hofburg Imperial Palace just because I’m standing there? Perhaps, if I’m in the mood for ironic social commentary or I’m a tourist with an interest in architecture, but most people won’t be the right audience for any given link. One man’s art is another man’s graffiti, and the world-annotation systems I’ve seen are currently little more than virtual spray paint.

The variability is the real key. If 90% of the tags I come across link to something interesting to me, I’ll probably follow every one I see. If only 50% link to something interesting, I might look at the human-readable title printed on the tag and then decide whether I think it likely that the article will be well-written and interest me. If 90% of the tags wind up being useless, I won’t even bother reading the title — and then it won’t matter that there are 10% that I would have enjoyed if I had bothered to look.

I’m not totally pessimistic about this sort of technology though. With the right combination of filtering (to make tags I don’t care about completely invisible), subtlety (to make the tags I might care about still be unobtrusive in case I don’t want to be bothered) and community support (to insure relevance to me and to bond me to my community regardless of the link quality), I could see something like this finally taking off.

(Thanks to Eugen Leitl on the Wearables mailing list for the link!)

Wikipedia the (physical) World Read More »

Bay Area event: two Ray Kurzweil talks

Ray Kurzweil will be speaking about his new book, The Singularity Is Near: When Humans Transcend Biology, at two events in the San Francisco Bay Area next week:

  • Thursday, Sept. 22nd at the SDForum at SAP in Palo Alto Registration at 6pm, lecture 7-8:30, $25 for non-members, $35 at the door)

  • Friday, Sept. 23rd at the Long Now Seminar at the Herbst Theater in San Francisco. Doors open at 7, lecture at 7:30 with $10 suggested (but not required) donation.

(JD Lasica has a review of Ray’s book over at New Media Musings.)

Bay Area event: two Ray Kurzweil talks Read More »

Fat Pings and Atom Streams

There’s interesting work going on between Brad Fitzpatrick at LiveJournal, Bob Wyman of PubSub and other folks at SixApart (who make the MoveableType blogging software) about making continuous streams of blogging content so large aggregators (like PubSub, Technorati or Google) can get continuous updates from large sources of blog posts like LiveJournal, SixApart or Blogger.

See Brad’s post on LJ for the inital proposal, these threads on integrating it with the Atom protocol and the Six Apart Update Stream for developments.

Or, if you feel like playing yourself, type this into a command prompt to see a continuous stream of “No one understands me. Should I dye my hair pink or blue?”:

telnet updates.sixapart.com 8081
GET /atom-stream.xml HTTP/1.0 

(Extra points for being the first one to plug it into a screensaver 🙂

Fat Pings and Atom Streams Read More »

Google blog search

I’ve been wondering when Google would get around to this. A few days ago they announced Google Blog Search, which indexes blog entries based on RSS or Atom feeds.

Google’s playing catch-up to smaller services like Technorati, but seem to have scooped Yahoo! and MSN, both of whom have been rumored to be coming out with an RSS-feed search “any day now” for months (Yahoo! even briefly revealed a test page before they realized it wasn’t being firewalled properly).

One feature Google gets right is that every page includes a link to subscribe to an RSS or Atom feed on that query, essentially turning any search phrase into an aggregator. Technorati has something similar with their watchlists, but you have to create an account and go through their page to create a new standing query. Google just creates the contents on the fly &mash; a big win in terms of ease-of-use since you’re likely to most want a standing query after you’ve just done the search as a regular one.

(Discovered via Political Animal, of all places, but there are also announcements at SixApart and John Battelle’s Searchblog.)

Google blog search Read More »