{"id":1094,"date":"2023-09-25T13:24:16","date_gmt":"2023-09-25T20:24:16","guid":{"rendered":"https:\/\/www.docbug.com\/blog\/?p=1094"},"modified":"2023-09-25T13:24:16","modified_gmt":"2023-09-25T20:24:16","slug":"laundering-copyright-at-scale","status":"publish","type":"post","link":"https:\/\/www.docbug.com\/blog\/archives\/1094","title":{"rendered":"Laundering Copyright At Scale"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">The Author&#8217;s Guild just added their own <a href=\"https:\/\/www.reuters.com\/legal\/john-grisham-other-top-us-authors-sue-openai-over-copyrights-2023-09-20\/\">class-action lawsuit<\/a> against OpenAI, claiming that using their copyrighted works to train ChatGPT violated their respective copyrights. This is essentially the same argument made in two <a href=\"https:\/\/www.theverge.com\/2023\/7\/9\/23788741\/sarah-silverman-openai-meta-chatgpt-llama-copyright-infringement-chatbots-artificial-intelligence-ai\">other<\/a> <a href=\"https:\/\/www.reuters.com\/legal\/lawsuit-says-openai-violated-us-authors-copyrights-train-ai-chatbot-2023-06-29\/\">lawsuits<\/a> filed a few months ago and in the <a href=\"https:\/\/stablediffusionlitigation.com\/pdf\/00201\/1-1-stable-diffusion-complaint.pdf\">class<\/a><a href=\"https:\/\/www.docbug.com\/blog\/archives\/1041\">-action lawsuit<\/a> filed by artists against Stability AI. As I <a href=\"https:\/\/www.docbug.com\/blog\/archives\/1041\">said<\/a> with the Stable Diffusion case, case law <a href=\"https:\/\/www.uspto.gov\/sites\/default\/files\/documents\/OpenAI_RFC-84-FR-58141.pdf\">suggests<\/a>\u00a0that training an AI\u00a0<em>is<\/em>\u00a0fair use, though it\u2019s far from\u00a0<a href=\"https:\/\/www.theverge.com\/23444685\/generative-ai-copyright-infringement-legal-fair-use-training-data\">settled<\/a>. Either way I&#8217;m sure the big players are busy training &#8220;clean&#8221; models using only public domain and licensed content (particularly content they already &#8220;own&#8221;), both as a hedge and because uncertainty about fair use will naturally tamp down any competitors who don&#8217;t have the resources to make their own clean versions. There&#8217;s already <a href=\"https:\/\/www.theverge.com\/2023\/9\/25\/23884679\/getty-ai-generative-image-platform-launch\">word<\/a> that Getty Images is partnering with Nvidia to create it&#8217;s own generative AI system trained only on it&#8217;s own library, and I&#8217;m sure they aren&#8217;t the only ones.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">But don&#8217;t expect clean training data to make artists and authors any happier, because this whole debate isn&#8217;t really about how these models were trained \u2014 it&#8217;s about what they can do. Copyright law protects a fixed expression of an idea \u2014 the words on a page, placement of ink in a drawing, even composition of a photograph \u2014 but not the idea itself. That&#8217;s by design, because art inherently builds on what came before it, &#8220;stealing&#8221; the best ideas and remixing them into something new. If copyright were extended too broadly we might never have seen another detective story after Edgar Alan Poe&#8217;s <em>The Murders in the Rue Morgue<\/em>, or another Pointallism painting after Georges Seurat&#8217;s\u00a0<em>A Sunday Afternoon on the Island of La Grande Jatte<\/em>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In general artists are comfortable with this kind of &#8220;stealing&#8221; so long as it pushes the art in new directions. As TS Elliot said, \u201cImmature poets imitate; mature poets steal; bad poets deface what they take, and good poets make it into something better, or at least something different.\u201d Austin Kleon, author of <em>Steal Like An Artist<\/em>, <a href=\"https:\/\/www.youtube.com\/watch?v=oww7oB9rjgw\">put it more succinctly<\/a>: &#8220;Imitation is not flattery. Transformation is flattery.&#8221; Copyright law tries to capture this distinction between good stealing and, well, just plain stealing by requiring that there be &#8220;substantial similarity&#8221; to a copied work for there to be infringement, and by carving out fair-use exceptions for reasonable sampling in transformative works. Factual works have to be especially similar to be infringing, to the point where it&#8217;s perfectly legal (and as long as credit is given, perfectly acceptable) for newspapers to <a href=\"https:\/\/www.google.com\/search?q=site%3Abostonglobe.com+%22new+york+times+reports%22\">rewrite their competitor&#8217;s reporting<\/a> in their own voice. The similarity threshold for what counts as infringement for art and fiction isn&#8217;t quite as high, but it&#8217;s still legal to <a href=\"https:\/\/www.thelegalartist.com\/blog\/you-cant-copyright-style\">copy an artist&#8217;s styl<\/a>e and general form as long as there are enough differences overall.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The hope, presumably, is that this similarity threshold is a way to allow good copying and outlaw bad copying without forcing judges to decide on the artistic merit of the changes that were made. But what about works that don&#8217;t really add anything useful to a prior art but still tweak it <em>just enough<\/em> to avoid copyright infringement. Take much of what comes out of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Content_farm\">content farms<\/a> like <a href=\"https:\/\/en.wikipedia.org\/wiki\/Leaf_Group\">Demand Media<\/a> (eHow.com, answers.com), which is essentially regurgitated content from blogs, Reddit and Wikipedia with just enough rewriting to pass copyright muster, or at least to pass the filters that Google uses to deprioritize such low-value-added content in search results. <\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In theory content mills could add value above and beyond the original, but the business model only prioritizes quantity and high search-engine rank (preferably higher than whoever you copied from). In the early 2010s these sites relied mostly on severely underpaid contractors to churn out blog posts for pennies per word, but nowadays more and more of this work is being <a href=\"https:\/\/gizmodo.com\/content-farms-ai-chatbots-plagiarize-news-nyt-1850770474#:~:text=Online%20content%20farms%20are%20using,report%20from%20misinformation%20monitor%20NewsGuard.\">handed over completely to generative AI.<\/a> For example, take <a href=\"https:\/\/contentatscale.ai\/\">Content At Scale<\/a>, who advertise a service that uses generative AI to write a search-engine optimized blog post or article based just on the set of keywords you want to rank for in web searches. Or they can write articles based on your competitor&#8217;s content: <em>&#8220;Have a competitor that\u2019s crushing it with their content marketing? Or have awesome thought leaders or content sites in your niche? \u2026 Take any existing article, and have a freshly written article created that uses the source URL as context for the all new article.&#8221;<\/em>  They can also go straight from podcast or YouTube video to blog post, and just in case you missed what this was really about they advertise that one of their advantages over existing content mill services (besides price) is that they automatically integrate scans to make sure their posts aren&#8217;t tagged for plagiarism or AI-written content.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Rewriting someone else&#8217;s material either to avoid copyright infringement or to avoid its detection is being called <a href=\"https:\/\/en.wiktionary.org\/wiki\/copyright_laundering\"><em>copyright laundering<\/em><\/a>, with analogy to money laundering. But unlike money laundering as long as you change enough to pass the substantial similarity threshold it&#8217;s perfectly legal. And it&#8217;s also not just news articles and blog posts that are being generated anymore. Just last week Amazon <a href=\"https:\/\/arstechnica.com\/information-technology\/2023\/09\/ai-generated-books-force-amazon-to-cap-ebook-publications-to-3-per-day\/\">announced<\/a> that they were reducing the number of books an author could self-publish on Kindle to three books <em>per day<\/em> because of AI-generated content.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">No wonder authors are pissed!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The Author&#8217;s Guild just added their own class-action lawsuit against OpenAI, claiming that using their copyrighted works to train ChatGPT violated their respective copyrights. This is essentially the same argument made in two other lawsuits filed a few months ago and in the class-action lawsuit filed by artists against Stability AI. As I said with [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"inline_featured_image":false,"site-sidebar-layout":"default","site-content-layout":"","ast-site-content-layout":"default","site-content-style":"default","site-sidebar-style":"default","ast-global-header-display":"","ast-banner-title-visibility":"","ast-main-header-display":"","ast-hfb-above-header-display":"","ast-hfb-below-header-display":"","ast-hfb-mobile-header-display":"","site-post-title":"","ast-breadcrumbs-content":"","ast-featured-img":"","footer-sml-layout":"","ast-disable-related-posts":"","theme-transparent-header-meta":"","adv-header-id-meta":"","stick-header-meta":"","header-above-stick-meta":"","header-main-stick-meta":"","header-below-stick-meta":"","astra-migrate-meta-layouts":"set","ast-page-background-enabled":"default","ast-page-background-meta":{"desktop":{"background-color":"var(--ast-global-color-4)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"ast-content-background-meta":{"desktop":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"tablet":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""},"mobile":{"background-color":"var(--ast-global-color-5)","background-image":"","background-repeat":"repeat","background-position":"center center","background-size":"auto","background-attachment":"scroll","background-type":"","background-media":"","overlay-type":"","overlay-color":"","overlay-opacity":"","overlay-gradient":""}},"footnotes":""},"categories":[4],"tags":[],"class_list":["post-1094","post","type-post","status-publish","format-standard","hentry","category-intellectual-property"],"_links":{"self":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts\/1094","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/comments?post=1094"}],"version-history":[{"count":5,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts\/1094\/revisions"}],"predecessor-version":[{"id":1099,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/posts\/1094\/revisions\/1099"}],"wp:attachment":[{"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/media?parent=1094"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/categories?post=1094"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.docbug.com\/blog\/wp-json\/wp\/v2\/tags?post=1094"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}