The New York Times’s Landmark AI Copyright Battle

The New York Times, a vanguard of journalism and legal precedent, has filed a groundbreaking lawsuit against Microsoft and OpenAI. Filed in the Southern District of New York in December 2023, the legal battle that is about to ensue will likely answer many of the questions relating to artificial intelligence and copyright laws, and will surely head to the Supreme Court.

The Core Issue: AI, Copyright, and the Controversy of ‘Scraping’ and “Transformative Works”

What information is OpenAI permitted to “scrape” without violating copyright laws when training its models? If someone’s copyrighted work is “scraped” by ChatGPT, does that constitute copyright infringement?

Does training the model (scraping) fall within the exception of fair use? 

“Outputs” are the direct result of the “input(s)” utilized by the user, and vary greatly from user to user. At what point is the output created a “transformative” work unique enough to warrant it’s own copyright protection? 

If “scraping” is deemed a copyright infringement, will OpenAI be forced to destroy “infringing works” based upon such infringement?

And if that occurs, what ownership rights do you have over the GPTs or other GenAI/LLM tools that you are using, and the outputs created thereon?

On the flip side, are we simply at a point in history where, given the transformative technology of artificial intelligence, United States copyright laws are outdated? 

New York Times AI Lawsuit

Does Artificial Intelligence produce “transformative” works? 

In the nearly 70 page complaint, the NYT sets forth a very persuasive argument that likely aligns with the opinion of copyright registrants around the world. According to the Times, “This commercial success is built in large part on OpenAI’s large-scale copyright infringement. One of the central features driving the use and sales of ChatGPT and its associated products is the LLM’s ability to produce natural language text in a variety of styles. 

At the heart of the argument is the question of the legality of ‘scraping’ – a technique where AI models like ChatGPT harvest vast swaths of online data. This process, akin to a million people simultaneously scouring through hundreds of thousands of websites, is a cornerstone for AI development but poses significant legal quandaries. Namely, where do we draw the line between technological advancement and copyright infringement?

Of course, this is a concern shared by copyright registrants throughout the United States. If, under our current copyright laws, OpenAI and Microsoft are found to have violated the rights of copyright holders, they could be liable for damages of up to $150,000 per willful violation.

We’re looking at a potentially billion dollar suit that could rewrite copyright laws in the process. 

Section 107 of the 1976 Copyright Act requires a four step examination of (1) the purpose and character of the work (i.e. the extent to which the work is transformative, not merely derivative of a earlier work), (2) the nature of the copyrighted work, (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole, and (4) the effect of the use upon the potential market for or value of the copyrighted work.

To be successful in their defense of fair use, Microsoft and OpenAI will have to show that the way in which models like ChatGPT operate are “transformative”; essentially meaning that the work created using elements of copyrighted work is used so differently than the original work, it has become something new. 

In plain english: It is possible to create transformative (totally unique) outputs when using tools like ChatGPT.

It’s also possible to instruct the GPT to directly replicate copyrighted work. As the Times goes on to argue: “To achieve this result, OpenAI made numerous reproductions of copyrighted works owned by The Times in the course of “training” the LLM….Defendants have refused to recognize this protection. Powered by LLMs containing copies of Times content, Defendants’ GenAI tools can generate output that recites Times content verbatim, closely summarizes it, and mimics its expressive style, as demonstrated by scores of examples.”

If an artificial intelligence tool can be asked to replicate copyrighted material, and is capable of doing so, can (or should) it be entitled to the fair use defense?

Do we have any precedent? 

Or indication of how this case will be decided? Not really. This is a very novel but, but the legal precedent will likely be driven by two cases. The first found that tech was used in a way that was “transformative”; the second found that it was not.  

Authors Guild v. Google, 721 F.3d 132 (2nd. Cir. 2015): In 2015, a Federal court determined Google’s digitizing of millions of copyright-protected works into their Google Books library was “highly transformative” because users were only allowed to use the “snippet function” and view “portions” of the books, and, thus, the use was not a “meaningful substitute” for the book market. The court concluded this type of use, digitizing books and displaying snippets through a search function, was non-infringing fair use. 

Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith, 598 U.S. __ (2023): (intro contrasting with Google), the Supreme Court determined Andy Warhol’s use of his altered version of a photograph of the late artist, Prince, owned by Lynn Goldsmith, was deemed not “transformative” enough to be fair use. The Supreme Court held that a “transformative purpose and character must, at bare minimum, comprise something more than the imposition of another artist’s style on the primary work.” Both Goldsmith’s photo and Andy Warhol’s altered image were being licensed to magazines and, thus, shared “substantially the same commercial purpose”. In other words, it was basically sold to the same consumer, making Warhol’s use of the photograph a copyright infringement.

A key component of this case will be whether or not the Times can prove that these AI tools can create works for the same reader. IE, could someone use a GPT and reproduce an article? 

If the New York Times wins, it could have a huge impact on how we use artificial intelligence.  

If Microsoft and OpenAI are able to show that scraping is “transformative”, they could fall within the exception of the Fair Use Doctrine. If the NYT is successful, those platforms could theoretically be required to erase existing data sets and start fresh based on original or licensed content. This outcome could not only redefine AI’s trajectory but also set new benchmarks for technological innovation within the bounds of copyright law.

We’ve called this the “wild west” of artificial intelligence and intellectual property. What if, a year from now, we can’t use AI tools as liberally as we can today? 

We’ll be keeping an eagle eye on this case. In the meantime, let this serve as a reminder that as 

AI continues to become a more frequently used tool by businesses all over the world, it is of utmost importance to remain educated on the legal complexities and effects of AI on copyright law. Stay strategic, and stay ahead of the game.  

Leave a Reply

Your email address will not be published. Required fields are marked *

Join our list

Receive our weekly updates, travel tips, and of course stories of couples in love!