{"id":409,"date":"2005-08-02T21:44:36","date_gmt":"2005-08-02T21:44:36","guid":{"rendered":"https:\/\/www.docbug.com\/blog\/archives\/409"},"modified":"2005-08-02T21:44:36","modified_gmt":"2005-08-02T21:44:36","slug":"what-about-a-google-cache-on-my-desk","status":"publish","type":"post","link":"https:\/\/www.docbug.com\/blog\/archives\/409","title":{"rendered":"What about a Google cache on my desk?"},"content":{"rendered":"<p><a href=\"http:\/\/docbug.com\/blog\/archives\/000408.html\">Yesterday I said<\/a> that within a decade disk space should be cheap enough to put the entire visible web on your desk for under $1000. I think that&#8217;s actually a pretty conservative estimate, since it assumes a 100 KB average page size, up to an order of magnitude higher than some estimates.<\/p>\n<p>Here&#8217;s another back-of-the envelope: let&#8217;s say we wanted the equivalent of Google&#8217;s webcache on your desktop (that is, all the HTML but no images). Another way to calculate it starts with the fact that the 2003 update to Berkeley&#8217;s <a href=\"http:\/\/www.sims.berkeley.edu\/research\/projects\/how-much-info-2003\/internet.htm\">How Much Info?<\/a> study estimated that in 2002 the web was only 167 Terabytes total, with only 30 TB as HTML (69 TB when you include images). Assuming 75% compression, that&#8217;s just around 8 TB. That same year a <a href=\"http:\/\/www.dlib.org\/dlib\/april03\/lavoie\/04lavoie.html\">2002 OCLC study<\/a> calculated that the total number of web pages was only increasing by about 5% per year (with the number of sites actually shrinking, but the number of pages per site growing). That rate had been decreasing ever since the explosion in the mid &#8217;90s, but let&#8217;s assume growth became a steady 5% and will stay at that rate for the next few years. (There are a lot of assumptions going on here, but the nice thing about these kinds of curves is that even if my numbers are off by a factor of two somewhere, so long as disk keeps increasing at the same rate that crossover point only changes by one year.)<\/p>\n<p>Now we&#8217;ve got two trends, and just need to find the intersection point for the price we want:<\/p>\n<table cellpadding=10 border=\"1\">\n<tr>\n<th>Year<\/th>\n<th>Price of 1 TB disk<\/th>\n<th>Size of public web<br \/>(compressed HTML only,<br \/>assumes 5% growth\/year)<\/th>\n<th>Cost to store<\/th>\n<\/tr>\n<tr>\n<td>2002<\/td>\n<td><\/td>\n<td>8 TB<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>2003<\/td>\n<td><\/td>\n<td>8.5 TB<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>2004<\/td>\n<td><\/td>\n<td>8.8 TB<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>2005<\/td>\n<td>$500<\/td>\n<td>9.25 TB<\/td>\n<td>$4,625<\/td>\n<\/tr>\n<tr>\n<td>2006<\/td>\n<td>$250<\/td>\n<td>9.7 TB<\/td>\n<td>$2,425<\/td>\n<\/tr>\n<tr>\n<td>2007<\/td>\n<td>$125<\/td>\n<td>10.2 TB<\/td>\n<td>$1,275<\/td>\n<\/tr>\n<tr>\n<td>2008<\/td>\n<td>$62.50<\/td>\n<td>10.7 TB<\/td>\n<td>$670<\/td>\n<\/tr>\n<tr>\n<td>2009<\/td>\n<td>$31.25<\/td>\n<td>11.25 TB<\/td>\n<td>$350<\/td>\n<\/tr>\n<tr>\n<td>2010<\/td>\n<td>$15.50<\/td>\n<td>11.8 TB<\/td>\n<td>$185<\/td>\n<\/tr>\n<\/table>\n<p>So given a few assumptions, we&#8217;ll be able to cache all the raw text on the public web for under $1000 (disk cost) <i>within 3 years!<\/i><\/p>\n","protected":false},"excerpt":{"rendered":"<p><a href=\"http:\/\/docbug.com\/blog\/archives\/000408.html\">Yesterday I said<\/a> that within a decade disk space should be cheap enough to put the entire visible web on your desk for under $1000. I think that&#8217;s actually a pretty conservative estimate, since it assumes a 100 KB average page size, up to an order of magnitude higher than some estimates.<\/p>\n<p>Here&#8217;s another back-of-the envelope: let&#8217;s say we wanted the equivalent of Google&#8217;s webcache on your desktop (that is, all the HTML but no images). Another way to calculate it starts with the fact that the 2003 update to Berkeley&#8217;s <a href=\"http:\/\/www.sims.berkeley.edu\/research\/projects\/how-much-info-2003\/internet.htm\">How Much Info?<\/a> study estimated that in 2002 the web was only 167 Terabytes total, with only 30 TB as HTML (69 TB when you include images). Assuming 75% compression, that&#8217;s just around 8 TB. That same year a <a href=\"http:\/\/www.dlib.org\/dlib\/april03\/lavoie\/04lavoie.html\">2002 OCLC study<\/a> calculated that the total number of web pages was only increasing by about 5% per year (with the number of sites actually shrinking, but the number of pages per site growing). That rate had been decreasing ever since the explosion in the mid &#8217;90s, but let&#8217;s assume growth became a steady 5% and will stay at that rate for the next few years. (There are a lot of assumptions going on here, but the nice thing about these kinds of curves is that even if my numbers are off by a factor of two somewhere, so long as disk keeps increasing at the same rate that crossover point only changes by one year.)<\/p>\n<p>Now we&#8217;ve got two trends, and just need to find the intersection point for the price we want:<\/p>\n<table cellpadding=10 border=\"1\">\n<tr>\n<th>Year<\/th>\n<th>Price of 1 TB disk<\/th>\n<th>Size of public web<br \/>(compressed HTML only,<br \/>assumes 5% growth\/year)<\/th>\n<th>Cost to store<\/th>\n<\/tr>\n<tr>\n<td>2002<\/td>\n<td><\/td>\n<td>8 TB<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>2003<\/td>\n<td><\/td>\n<td>8.5 TB<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>2004<\/td>\n<td><\/td>\n<td>8.8 TB<\/td>\n<td><\/td>\n<\/tr>\n<tr>\n<td>2005<\/td>\n<td>$500<\/td>\n<td>9.25 TB<\/td>\n<td>$4,625<\/td>\n<\/tr>\n<tr>\n<td>2006<\/td>\n<td>$250<\/td>\n<td>9.7 TB<\/td>\n<td>$2,425<\/td>\n<\/tr>\n<tr>\n<td>2007<\/td>\n<td>$125<\/td>\n<td>10.2 TB<\/td>\n<td>$1,275<\/td>\n<\/tr>\n<tr>\n<td>2008<\/td>\n<td>$62.50<\/td>\n<td>10.7 TB<\/td>\n<td>$670<\/td>\n<\/tr>\n<tr>\n<td>2009<\/td>\n<td>$31.25<\/td>\n<td>11.25 TB<\/td>\n<td>$350<\/td>\n<\/tr>\n<tr>\n<td>2010<\/td>\n<td>$15.50<\/td>\n<td>11.8 TB<\/td>\n<td>$185<\/td>\n<\/tr>\n<\/table>\n<p>So given a few assumptions, we&#8217;ll be able to cache all the raw text on the public web for under $1000 (disk cost) <i>within 3 years!<\/i><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[5],"tags":[],"class_list":["post-409","post","type-post","status-publish","format-standard","hentry","category-media-technology"],"_links":{"self":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts\/409","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/comments?post=409"}],"version-history":[{"count":0,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts\/409\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/media?parent=409"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/categories?post=409"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/tags?post=409"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}