{"id":408,"date":"2005-08-02T08:12:07","date_gmt":"2005-08-02T08:12:07","guid":{"rendered":"https:\/\/www.docbug.com\/blog\/archives\/408"},"modified":"2005-08-02T08:12:07","modified_gmt":"2005-08-02T08:12:07","slug":"when-do-i-get-the-web-in-my-pocket","status":"publish","type":"post","link":"https:\/\/www.docbug.com\/blog\/archives\/408","title":{"rendered":"When do I get the web in my pocket?"},"content":{"rendered":"<p>Some time ago I <a href=\"http:\/\/docbug.com\/blog\/archives\/000116.html\">asked<\/a> how much longer before I can have <a href=\"http:\/\/citeseer.ist.psu.edu\/rd\/42065565%2C322941%2C1%2C0.25%2CDownload\/http:\/\/citeseer.ist.psu.edu\/cache\/papers\/cs\/15727\/http:zSzzSzwww.research.microsoft.comzSzresearchzSzdbzSzdebullzSz98junezSzwebbase.pdf\/brin98what.pdf\">the Web in my pocket<\/a>. Let&#8217;s try a quick back-of-the-envelope calculation:<\/p>\n<p>A <a href=\"http:\/\/www.cs.uiowa.edu\/~asignori\/web-size\/\">paper<\/a> from January 2005 calculates the publicly indexable Web (the part easily accessible to search engine web-crawlers) as being around 11.5 billion pages. Estimates on average webpage size seem to be all over the map, but let&#8217;s figure around 100 KB per page, for a total of around a petabyte (one million Gig) for today&#8217;s indexed web. (I&#8217;m assuming text and images, but ignoring other media.)<\/p>\n<p>Disk these days is going for less than <a href=\"http:\/\/www.pricegrabber.com\/search_attrib.php\/page_id=11\">50 cents per Gig<\/a>, so enough disk to store your own personal Google (and then some) costs around $500,000. With compression you can probably cut that in half. The price of disk is also falling by a <a href=\"http:\/\/www.ebu.ch\/trev_294-editorial.html\">factor of two every 12 months<\/a>, so assuming no major jumps or snags in the disk-price curve, in a little less than a decade we can expect to hold the equivalent of today&#8217;s indexed web for less than $1000.<\/p>\n<p>Now of course, in that time the web will continue to grow, so we may no longer be satisfied with our measly petabyte-on-the-desk, but I figure the amount of human-generated Web content has a much slower growth rate than our disk-space curve. The number of web sites <a href=\"http:\/\/www.dlib.org\/dlib\/april03\/lavoie\/04lavoie.html\">actually shrank<\/a> between 2001 and 2002, and though it now seems to be growing again there&#8217;s only so much content that human beings can create in a day. The real question I have is whether in a decade anyone will see having access to the whole web as being all that interesting \u2014 I could easily see the majority of people losing interest in the surface web in favor of personal deep-web niches. The only reason I want the whole web in my pocket is because it&#8217;s too hard for me to filter out in advance the 99.99% of the web that&#8217;ll never be of interest to me \u2014 the closer we get to that kind of pruning, the less disk we need and the higher-quality the experience will be.<\/p>\n<p class=\"update\"><b>Update 8\/2\/05:<\/b> doing a <a href=\"http:\/\/docbug.com\/blog\/archives\/000409.html\">different back-of-the-envelope estimate<\/a> leads to being able to store a compressed-HTML cache (no images) on less than $1000 worth of disk within <i>3<\/i> years&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Some time ago I <a href=\"http:\/\/docbug.com\/blog\/archives\/000116.html\">asked<\/a> how much longer before I can have <a href=\"http:\/\/citeseer.ist.psu.edu\/rd\/42065565%2C322941%2C1%2C0.25%2CDownload\/http:\/\/citeseer.ist.psu.edu\/cache\/papers\/cs\/15727\/http:zSzzSzwww.research.microsoft.comzSzresearchzSzdbzSzdebullzSz98junezSzwebbase.pdf\/brin98what.pdf\">the Web in my pocket<\/a>. Let&#8217;s try a quick back-of-the-envelope calculation:<\/p>\n<p>A <a href=\"http:\/\/www.cs.uiowa.edu\/~asignori\/web-size\/\">paper<\/a> from January 2005 calculates the publicly indexable Web (the part easily accessible to search engine web-crawlers) as being around 11.5 billion pages. Estimates on average webpage size seem to be all over the map, but let&#8217;s figure around 100 KB per page, for a total of around a petabyte (one million Gig) for today&#8217;s indexed web. (I&#8217;m assuming text and images, but ignoring other media.)<\/p>\n<p>Disk these days is going for less than <a href=\"http:\/\/www.pricegrabber.com\/search_attrib.php\/page_id=11\">50 cents per Gig<\/a>, so enough disk to store your own personal Google (and then some) costs around $500,000. With compression you can probably cut that in half. The price of disk is also falling by a <a href=\"http:\/\/www.ebu.ch\/trev_294-editorial.html\">factor of two every 12 months<\/a>, so assuming no major jumps or snags in the disk-price curve, in a little less than a decade we can expect to hold the equivalent of today&#8217;s indexed web for less than $1000.<\/p>\n<p>Now of course, in that time the web will continue to grow, so we may no longer be satisfied with our measly petabyte-on-the-desk, but I figure the amount of human-generated Web content has a much slower growth rate than our disk-space curve. The number of web sites <a href=\"http:\/\/www.dlib.org\/dlib\/april03\/lavoie\/04lavoie.html\">actually shrank<\/a> between 2001 and 2002, and though it now seems to be growing again there&#8217;s only so much content that human beings can create in a day. The real question I have is whether in a decade anyone will see having access to the whole web as being all that interesting \u2014 I could easily see the majority of people losing interest in the surface web in favor of personal deep-web niches. The only reason I want the whole web in my pocket is because it&#8217;s too hard for me to filter out in advance the 99.99% of the web that&#8217;ll never be of interest to me \u2014 the closer we get to that kind of pruning, the less disk we need and the higher-quality the experience will be.<\/p>\n<p class=\"update\"><b>Update 8\/2\/05:<\/b> doing a <a href=\"http:\/\/docbug.com\/blog\/archives\/000409.html\">different back-of-the-envelope estimate<\/a> leads to being able to store a compressed-HTML cache (no images) on less than $1000 worth of disk within <i>3<\/i> years&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[5],"tags":[],"class_list":["post-408","post","type-post","status-publish","format-standard","hentry","category-media-technology"],"_links":{"self":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts\/408","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/comments?post=408"}],"version-history":[{"count":0,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts\/408\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/media?parent=408"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/categories?post=408"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/tags?post=408"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}