When do I get the web in my pocket?

Some time ago I asked how much longer before I can have the Web in my pocket. Let’s try a quick back-of-the-envelope calculation:

A paper from January 2005 calculates the publicly indexable Web (the part easily accessible to search engine web-crawlers) as being around 11.5 billion pages. Estimates on average webpage size seem to be all over the map, but let’s figure around 100 KB per page, for a total of around a petabyte (one million Gig) for today’s indexed web. (I’m assuming text and images, but ignoring other media.)

Disk these days is going for less than 50 cents per Gig, so enough disk to store your own personal Google (and then some) costs around $500,000. With compression you can probably cut that in half. The price of disk is also falling by a factor of two every 12 months, so assuming no major jumps or snags in the disk-price curve, in a little less than a decade we can expect to hold the equivalent of today’s indexed web for less than $1000.

Now of course, in that time the web will continue to grow, so we may no longer be satisfied with our measly petabyte-on-the-desk, but I figure the amount of human-generated Web content has a much slower growth rate than our disk-space curve. The number of web sites actually shrank between 2001 and 2002, and though it now seems to be growing again there’s only so much content that human beings can create in a day. The real question I have is whether in a decade anyone will see having access to the whole web as being all that interesting — I could easily see the majority of people losing interest in the surface web in favor of personal deep-web niches. The only reason I want the whole web in my pocket is because it’s too hard for me to filter out in advance the 99.99% of the web that’ll never be of interest to me — the closer we get to that kind of pruning, the less disk we need and the higher-quality the experience will be.

Update 8/2/05: doing a different back-of-the-envelope estimate leads to being able to store a compressed-HTML cache (no images) on less than $1000 worth of disk within 3 years…