Court finds fair use in AI training but rejects piracy defense

Federal judge rules AI training constitutes transformative fair use while allowing piracy claims to proceed to trial.

Luis Rijo

Jul 2, 2025 • 6 min read

Federal judge's gavel and law books with digital AI elements representing landmark fair use ruling

A U.S. federal judge delivered a landmark split decision on June 23, 2025, in the copyright infringement case against AI company Anthropic PBC. Senior U.S. District Judge William Alsup ruled that using copyrighted books to train large language models constitutes fair use, but allowed claims over pirated content to proceed to trial.

According to court documents, authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson sued Anthropic in August 2024, alleging the company infringed their federal copyrights. The plaintiffs claimed Anthropic pirated copies of millions of books for its research library and reproduced them to train the LLMs powering its Claude AI service.

Get the PPC Land newsletter ✉️ for more like this.

Summary

Who: Senior U.S. District Judge William Alsup ruled in a copyright case between authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson against AI company Anthropic PBC.

What: The court delivered a split decision finding that using copyrighted books to train AI models constitutes fair use, while allowing claims over pirated content to proceed to trial for damages assessment.

When: The ruling was issued on June 23, 2025, in a case filed in August 2024, with trial scheduled for December 2025.

Where: The case was decided in the United States District Court for the Northern District of California, with potential implications for AI development nationwide.

Why: The decision addresses fundamental questions about copyright protection in the age of artificial intelligence, balancing innovation incentives against creator rights while establishing precedent for AI training practices.

Follow PPC Land on Google News

The case centered on three distinct uses of copyrighted material. Anthropic assembled a central library containing both pirated and purchased books, then created various data mixes from these sources to train different LLM versions. The Northern District of California court examined each use separately under the four-factor fair use test.

order

order.pdf

301 KB

Training copies deemed transformative

Judge Alsup found the training process "exceedingly transformative" and constituted fair use under Section 107 of the Copyright Act. "The purpose and character of using copyrighted works to train LLMs to generate new text was quintessentially transformative," the court stated.

The decision emphasized that Claude outputs do not infringe on the authors' works. According to the ruling, Anthropic implemented filtering software to prevent any infringing copies from reaching users through the public-facing Claude service. The court noted this was similar to Google's limitations on text snippets in its Google Books service.

According to expert testimony cited in the ruling, Anthropic valued books with "good writing" that "an editor would approve of" for training its LLMs. The court found this use analogous to how humans read and learn from texts without requiring payment for each use.

The training process involved multiple copying stages. Works selected for training were copied from the central library, cleaned of headers and page numbers, tokenized into numerical sequences, and compressed into the trained LLM. According to the court, each LLM retained "compressed" copies of training works and "memorized" them almost verbatim.

Format conversion approved

The court also ruled in favor of Anthropic's practice of purchasing print books and converting them to digital format. According to court documents, Anthropic spent millions of dollars purchasing print books, then destructively scanned them to create digital copies for its research library.

Judge Alsup found this format change transformative because it saved storage space and enabled searchability without creating additional copies. "The print original was destroyed. One replaced the other," the ruling stated. The court compared this to previous cases involving microfilm conversion for space conservation.

Piracy claims survive summary judgment

However, the court rejected Anthropic's fair use defense for pirated content. According to the ruling, Anthropic downloaded over seven million pirated books from sources including Books3, Library Genesis, and Pirate Library Mirror between January 2021 and July 2022.

The court found that building a central library of pirated works constituted a separate use from training LLMs. "Pirating copies to build a research library without paying for it, and to retain copies should they prove useful for one thing or another, was its own use — and not a transformative one," Judge Alsup wrote.

According to court documents, Anthropic retained pirated copies even after determining they would not be used for training. The company planned to "store everything forever" with "no compelling reason to delete a book."

The court distinguished between copies that were immediately transformed for fair use and those maintained in a general-purpose library. According to the ruling, downloading from pirate sites when books could have been purchased lawfully was "inherently, irredeemably infringing."

Trial set for damages assessment

The piracy portion of the case will proceed to trial in December 2025 to determine actual or statutory damages, including potential willfulness penalties. According to the Copyright Act, willful infringement can result in statutory damages up to $150,000 per work. The ruling states that Anthropic's later purchase of books it previously pirated "will not absolve it of liability for the theft."

The court noted that Anthropic resisted providing complete discovery about which library copies were used for various purposes. A spreadsheet showing data mix compositions was produced but later clawed back by Anthropic in April 2025.

Judge Alsup emphasized the objective nature of fair use analysis, stating that courts must look past "the subjective intent of the user" to examine actual use. The ruling cited CEO Dario Amodei's internal communications about avoiding "legal/practice/business slog" by downloading pirated content.

Industry implications

The decision provides the first substantive federal court ruling on fair use in AI training contexts. Research from 2024suggests countries worldwide are developing copyright exceptions for AI training, indicating a global trend toward accommodation.

According to Anthropic's statement, the company praised the court's recognition that training LLMs is "transformative — spectacularly so." The company disagreed with allowing the piracy claims to proceed, stating it acquired books "for one purpose only — building large language models."

The Authors' Guild expressed disagreement with portions of the ruling but emphasized that "the judge understood the outrageous piracy." According to CEO Mary Rasenberger, the piracy liability "comes with statutory damages for intentional copyright infringement, which are quite high per book."

This ruling emerges as the Copyright Office released comprehensive AI training guidelines in May 2025, providing frameworks for evaluating fair use in AI contexts. The Office's analysis examined both legal and economic implications of AI training on copyrighted works.

Broader legal landscape

Similar cases are pending against other AI companies. Meta recently won a parallel case where U.S. District Judge Vince Chhabria ruled in the company's favor, though that decision applied only to specific works in the lawsuit. Authors in that case failed to demonstrate market impact from Meta's use of their books.

The divergent approaches between AI companies regarding content licensing continue to shape the legal landscape. While Google pursues formal licensing agreements with publishers like the Associated Press, other companies face ongoing litigation over their training practices.

According to legal experts, these rulings help establish precedent for AI companies regarding which uses constitute fair use versus infringement. The bifurcated nature of this decision - approving transformative training while condemning piracy - provides guidance for future AI development practices.

The case highlights the evolving intersection of copyright law and artificial intelligence, with courts attempting to balance innovation incentives against creator rights. The eventual trial outcome on damages could significantly impact how AI companies approach content acquisition for training purposes.

Timeline

January-February 2021: Anthropic cofounders begin downloading pirated books from Books3, containing 196,640 unauthorized copies
June 2021: Ben Mann downloads at least 5 million pirated books from Library Genesis
July 2022: Anthropic downloads at least 2 million books from Pirate Library Mirror
March 2023: Claude AI service launches publicly, first of seven successive versions
February 2024: Anthropic hires Tom Turvey to obtain "all the books in the world" while avoiding legal complexities
Spring 2024: Anthropic begins bulk-purchasing print books for digital scanning
August 2024: Authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson file putative class action lawsuit
October 2024: Court requires class certification motions by March 6, 2025
February 2025: Court grants Anthropic's motion for early summary judgment on fair use before class certification
April 2025: Anthropic claws back spreadsheet showing data mix compositions used for training various LLMs
June 23, 2025: Judge William Alsup issues mixed ruling on fair use and piracy claims
December 2025: Trial scheduled on pirated copies and resulting damages