{"id":507,"date":"2005-12-31T07:58:18","date_gmt":"2005-12-31T07:58:18","guid":{"rendered":"https:\/\/www.docbug.com\/blog\/archives\/507"},"modified":"2005-12-31T07:58:18","modified_gmt":"2005-12-31T07:58:18","slug":"annotated-blog-corpus-to-be-released-at-wwe-2006","status":"publish","type":"post","link":"https:\/\/www.docbug.com\/blog\/archives\/507","title":{"rendered":"Annotated blog corpus to be released at WWE 2006"},"content":{"rendered":"<p><a href=\"http:\/\/intelliseek.com\/\">Intelliseek<\/a> will be a big corpus of spidered and annotated blog posts to attendees at the <a href=\"http:\/\/www.blogpulse.com\/www2006-workshop\/\">3rd Annual Workshop on the Weblogging Ecosystem<\/a> (held in conjunction with the <a href=\"http:\/\/www2006.org\/\">WWW 2006 Conference<\/a> in Edinburgh, Scottland):<\/p>\n<blockquote>\n<p>The data release comprises a complete set of weblog posts for three weeks in July 2005 (on the order of 10M posts from 1M weblogs). This data set has been selected as it spans a period of time during which an event of global significance occurred, namely the London bombings.<\/p>\n<p>The data set includes the full content of the posts plus mark-up. The marked-up fields include: date of posting, time of posting, author name, title of the post, weblog url, permalink, tags\/categories, and outlinks classified by type &#8211; details may be found <a href=\"http:\/\/www.blogpulse.com\/www2006-workshop\/datashare-instructions.txt\">here<\/a>.<\/p>\n<\/blockquote>\n<p>Sounds like a great resource for researchers. I&#8217;m also amused (in a dark sort of way) by the <a href=\"http:\/\/www.blogpulse.com\/www2006-workshop\/datashare-agreement.pdf\">datashare individual agreement<\/a> they require people to sign \u2014 essentially they admit that there&#8217;s no way they can get copyright clearance from all million or so bloggers they&#8217;ve collected, so they just ask everyone to agree to remove any posts if anyone complains, not use the results for commercial purposes and not use it passed the workshop.<\/p>\n","protected":false},"excerpt":{"rendered":"<p><a href=\"http:\/\/intelliseek.com\/\">Intelliseek<\/a> will be a big corpus of spidered and annotated blog posts to attendees at the <a href=\"http:\/\/www.blogpulse.com\/www2006-workshop\/\">3rd Annual Workshop on the Weblogging Ecosystem<\/a> (held in conjunction with the <a href=\"http:\/\/www2006.org\/\">WWW 2006 Conference<\/a> in Edinburgh, Scottland):<\/p>\n<blockquote>\n<p>The data release comprises a complete set of weblog posts for three weeks in July 2005 (on the order of 10M posts from 1M weblogs). This data set has been selected as it spans a period of time during which an event of global significance occurred, namely the London bombings.<\/p>\n<p>The data set includes the full content of the posts plus mark-up. The marked-up fields include: date of posting, time of posting, author name, title of the post, weblog url, permalink, tags\/categories, and outlinks classified by type &#8211; details may be found <a href=\"http:\/\/www.blogpulse.com\/www2006-workshop\/datashare-instructions.txt\">here<\/a>.<\/p>\n<\/blockquote>\n<p>Sounds like a great resource for researchers. I&#8217;m also amused (in a dark sort of way) by the <a href=\"http:\/\/www.blogpulse.com\/www2006-workshop\/datashare-agreement.pdf\">datashare individual agreement<\/a> they require people to sign \u2014 essentially they admit that there&#8217;s no way they can get copyright clearance from all million or so bloggers they&#8217;ve collected, so they just ask everyone to agree to remove any posts if anyone complains, not use the results for commercial purposes and not use it passed the workshop.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"default","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[5],"tags":[],"class_list":["post-507","post","type-post","status-publish","format-standard","hentry","category-media-technology"],"_links":{"self":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts\/507","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/comments?post=507"}],"version-history":[{"count":0,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts\/507\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/media?parent=507"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/categories?post=507"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/tags?post=507"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}