The copyright battles over AI art start in earnest

Well, the long-anticipated copyright battles over AI-generated content have finally started. Last week a group of artists announced they are suing Stability AI, Midjourney and DeviantArt for using their artwork (and that of literally millions of other artists) to train their machine learning systems, claiming doing so violates their copyrights. And yesterday Getty Images announced their own lawsuit against Stability AI, arguing that “Stability AI unlawfully copied and processed millions of images protected by copyright and the associated metadata owned or represented by Getty Images” without a license. The class-action complaint makes several claims, but the most important ones are:

  1. Training an AI is not fair use: using an image to train an AI is not fair use, even when the works were published on the web and no copy is ever published. This appears to be Getty’s main argument as well, although in their case I expect they simply want to pressure Stability to pay them a license to make the case go away.
  2. Stable Diffusion is itself an infringing work: The complaint claims that Stable Diffusion actually contains all 5 billion training images in compressed form, and is thus is a derivative work in its own right. In their words, SD and similar systems are “is a collage tool, only capable of producing images that are remixed and reassembled from the copyrighted work of others.”

Case law has at least suggested that training an AI is fair use, but it’s far from settled. As for the second point, there’s no question that Stable Diffusion can generate images that, if published, would infringe on someone’s copyright and/or trademark (e.g. try entering “Batman” into Stable Diffusion’s generator). What the court will have to decide is whether the tool itself is more like a box of colored pencils (capable of creating infringing works but not infringing in their own right), or more like a bunch of superhero stencils you can mix and match.

In the end I’m not sure this will be much more than a speed bump for this technology. The technology has been proven well enough for companies to invest the resources to make a “clean” training set from public domain and licensable content, and from there individual companies and studios will train their own models to produce their own proprietary “style”.