WOLFcon 2024 - Understanding and Using AI Workflows with FOLIO

23 September 2024


Creator Attribution and Copyright

The training of Large Language Models (LLMs) requires massive amounts of text and other media that are commonly available on the open web. This content includes both copyrighted and public domain material, which can lead to generative outputs from these models closely resembling existing copyrighted works.

This resemblance in OpenAI's ChatGPT outputs, led the New York Times and other publishers to file a lawsuit in December 2023 against OpenAI and Microsoft alleging copyright infringement.1

OpenAI and Microsoft, in response to this lawsuit, claim that their use of copyright material falls under the Fair Use doctrine in United States, a position reaffirmed by an Association of Research Libraries (ARL) blog post2 which states in part:

We drafted the principles on AI and copyright in response to efforts to amend copyright law to require licensing schemes for generative AI that could stunt the development of this technology, and undermine its utility to researchers, students, creators, and the public.

In their article, Generative AI has a Visual Plagiarism Problem3, computer scientists Gary Marcus and Reid Southern, demonstrate how image generation of such models as Midjourney and DALL-E 3 create "plagiaristic outputs" of black-box LLMs, while unpredictable, allow for prompts that generate outputs remarkably similar to copyrighted images. They provide examples of querying both Midjourney and DALL-E to produce nearly identical outputs based on copyright material from Marvel movies and The Simpsons cartoons.