US Copyright Office releases major AI training report amid intensifying copyright debate

New framework determines when AI developers need permission to use copyrighted works, as tech giants pursue different licensing strategies.

Luis Rijo

May 11, 2025 • 8 min read

Illustration showing copyright documents and AI model diagrams, visualizing the balance between creative rights and AI training.

In a watershed moment for the AI industry and creative sectors, the US Copyright Office has released Part 3 of its comprehensive "Copyright and Artificial Intelligence" series, addressing the contentious issue of whether training generative AI models on copyrighted works constitutes fair use or requires licensing from rights holders.

This pivotal report emerges amid intensifying legal challenges, political upheaval affecting copyright leadership, and divergent approaches to AI content licensing among major technology companies.

1 MB

Political Turmoil as Copyright Guidance Emerges

The report's release coincides with extraordinary political developments affecting copyright leadership in the United States. According to POLITICO, President Donald Trump recently dismissed Register of Copyrights Shira Perlmutter in what appears to be an unprecedented intervention into what has traditionally been a legislative branch position. This dramatic move occurred just after Perlmutter allegedly "refused to rubber-stamp Elon Musk's efforts to mine troves of copyrighted works to train AI models," according to Representative Joe Morelle.

This political shake-up creates significant uncertainty around the enforcement and interpretation of the Copyright Office's new guidelines, particularly as the Office under Perlmutter's leadership had taken a balanced approach that neither fully embraced AI companies' expansive fair use claims nor capitalist to all creator demands for mandatory licensing.

Copyright Office's AI Initiative

The Copyright Office's Generative AI Training report represents the third installment in its comprehensive examination of copyright and AI issues. As reported by PPC Land in March 2024, the Copyright Office announced a multi-section approach to analyzing AI's relationship with copyright law. The initial section, published last July, explored digital replicas of artists' identities, while Part 2, released in January 2025, addressed the copyrightability of AI-generated content.

This latest report completes a crucial part of the Office's analysis by focusing on the use of copyrighted materials in training AI models - perhaps the most commercially significant and legally contested aspect of the AI revolution. A future section on licensing implications is still expected.

Against this backdrop, major technology companies have pursued divergent strategies regarding the use of copyrighted content in their AI systems. As PPC Land reported in January 2025, Google recently established a partnership with The Associated Press to integrate real-time news content into its Gemini AI application. This formal licensing arrangement stands in sharp contrast to OpenAI's approach, which has faced significant legal challenges, including a high-profile lawsuit from The New York Times regarding the alleged unauthorized use of its content.

These contrasting strategies highlight the uncertain legal landscape that the Copyright Office's new report attempts to navigate - with some companies opting for licensing agreements while others rely on expansive interpretations of fair use doctrine.

Economic Implications Take Center Stage

Just months ago in February 2025, PPC Land reported on the Copyright Office's exploration of the economic effects of AI on creative works. That economic analysis, titled "Identifying the Economic Implications of Artificial Intelligence for Copyright Policy," established a framework for evaluating how AI technologies affect traditional copyright incentives and market dynamics.

This economic perspective directly informs the new report's fair use analysis, particularly regarding the fourth fair use factor (market effects). The Office's comprehensive approach considers not just direct substitution but broader market dilution effects that may occur when AI systems generate content that competes with human creators.

Inside the Report

The 107-page pre-publication report, authored by the Register of Copyrights, provides an exhaustively detailed analysis of how generative AI development implicates copyright law and when the fair use doctrine may apply. Rather than making sweeping determinations, the Office establishes a nuanced framework for case-by-case evaluation.

The report demonstrates a sophisticated grasp of AI technology, explaining that while code defines a neural network's basic structure, it is the "weights" (parameters) reflecting patterns learned from training data that raise copyright concerns. The Office explains that modern neural networks with billions of parameters can compute highly complex transformations, such as converting text to video, and these weights are often treated as proprietary by developers.

For language models specifically, the report details how they are trained through "generative pre-training," which involves predicting each next token (word or word fragment) based on preceding context. The Office notes that foundation models require "internet-scale pre-training data, including large amounts of entire works" to achieve current performance levels.

Prima Facie Infringement Identified

The report systematically analyzes how AI development implicates copyright owners' exclusive rights across multiple stages:

Data collection and curation involves multiple copies of works through downloading, format conversion, and dataset creation.
Training implicates reproduction rights both through temporary copying during the training process and potentially through "memorization" where models retain copyrighted content.
Retrieval-Augmented Generation (RAG) involves copying works into retrieval databases or from external sources during generation.
Outputs may sometimes replicate or closely resemble copyrighted works, potentially infringing reproduction and derivative work rights.

Fair Use Analysis: A Balanced Approach

The report's fair use analysis forms its analytical core, examining each of the four statutory factors with remarkable depth.

For the first factor (purpose and character of use), the Office concludes that training foundation models on diverse datasets will "often be transformative" since it converts "a massive collection of training examples into a statistical model that can generate a wide range of outputs across a diverse array of new situations."

However, the report explicitly rejects two common arguments made by AI companies:

That AI training is inherently "non-expressive" - The Office counters that language models absorb "not just the meaning and parts of speech of words, but how they are selected and arranged at the sentence, paragraph, and document level—the essence of linguistic expression."
That AI learning is like human learning - The Office points out that "fair use does not excuse all human acts done for the purpose of learning" and that AI differs from humans in creating perfect copies and operating at "superhuman speed and scale."

On the crucial fourth factor (market effects), the report identifies several potential harms:

Lost sales when models output content substantially similar to training works
Market dilution when AI-generated content competes with human-created works
Lost licensing opportunities where markets exist or are developing
RAG-related substitution when systems retrieve copyrighted works and generate responses that satisfy users' needs for the original

The Office concludes that "the fourth factor should not be read so narrowly" as to ignore these broader market effects, noting that "the speed and scale at which AI systems generate content pose a serious risk of diluting markets for works of the same kind as in their training data."

The Licensing Landscape: Market Solutions Preferred

The report extensively examines licensing frameworks, noting that voluntary licensing is "increasingly taking place" across creative sectors. It acknowledges that licensing at scale faces challenges, particularly for works created outside professional industries or where ownership is diffuse.

Collective licensing receives particular attention as a promising approach that "can play a significant role in facilitating AI training, reducing what might otherwise be thousands or even millions of transactions to a manageable number."

While some commenters raised antitrust concerns with collective licensing, the Office encourages the Department of Justice to "provide guidance, including on the benefit of an antitrust exemption in this context."

The report discourages compulsory licensing, warning that such licenses "establish fixed royalty rates and terms and can set practices in stone; they can become inextricably embedded in an industry and become difficult to undo." For sectors where voluntary licensing proves unworkable, the Office suggests extended collective licensing (ECL) as "a less intrusive approach."

International Approaches Provide Context

The report examines how other jurisdictions are addressing AI training, particularly the European Union's text and data mining exceptions that include opt-out provisions for copyright owners. Japan's exception allows AI development uses only when not for "personally enjoy[ing]...the thoughts or sentiments expressed" in works, while the UK is considering expanding its research-only exception.

Implications and Industry Impact

The Copyright Office's analysis provides a roadmap for courts, companies, and creators navigating this complex landscape. Its conclusion that transformativeness and market effects will be the most significant factors could shape litigation strategy in pending lawsuits, including cases against OpenAI, Anthropic, Meta, and other AI developers.

For AI companies, the report suggests that implementing effective guardrails to prevent infringing outputs could strengthen fair use arguments. It also indicates that licensing should be seriously considered, especially for high-value creative content.

For copyright owners, the report acknowledges emerging licensing markets while validating concerns about market harm from unrestrained AI training. The positive view of collective licensing suggests creators may benefit from organized approaches rather than individual negotiations.

What comes next

With the Copyright Office emphasizing that its conclusions are based on "current circumstances and publicly available information," the report represents a starting point rather than the final word. Several factors will shape how this issue evolves:

Court decisions in pending lawsuits will test the Office's framework and potentially create binding precedent
Market developments in licensing may reduce pressure for legislative intervention
Technical innovation may change the fair use calculus as AI capabilities evolve
International harmonization efforts may influence U.S. policy
Political leadership changes at the Copyright Office could shift enforcement priorities

As Register Perlmutter (prior to her dismissal) stated in the report, "The public interest requires striking an effective balance, allowing technological innovation to flourish while maintaining a thriving creative community." The extent to which that balance is achieved will determine whether AI becomes a tool that enhances human creativity or threatens its economic foundation.

In the report's conclusion, Register of Copyrights Shira Perlmutter emphasizes that "copyright law has adapted to new technology, furthering its progress while preserving incentives for creative activity. This has enabled our nation's creative and technology industries to become global leaders in their fields." She adds that "American leadership in the AI space would best be furthered by supporting both of these world-class industries that contribute so much to our economic and cultural advancement. Effective licensing options can ensure that innovation continues to advance without undermining intellectual property rights."

Perlmutter further notes that "these groundbreaking technologies should benefit both the innovators who design them and the creators whose content fuels them, as well as the general public." The report concludes with a commitment to "continue to monitor developments in technology, case law, and markets, and to offer further assistance to Congress as it considers these issues."

Timeline

2023

August 2023: U.S. Copyright Office publishes Notice of Inquiry on copyright and AI issues, receiving over 10,000 comments from stakeholders
October 2023: Venture capital firm Andreessen Horowitz (a16z) submits formal comments arguing AI training constitutes fair use under existing law
December 2023: Authors and publishers file lawsuits against AI companies including OpenAI and Meta regarding unauthorized use of copyrighted works for training

2024

March 2024: Copyright Office announces multi-section report approach to address various aspects of AI and copyright issues
July 22, 2024: Office introduces Group Registration for Updates to a News Website (GRNW) option to address copyright challenges for online publishers
July 31, 2024: Part 1 of AI Report released, addressing digital replicas of artists' identities
October 2024: Dow Jones files major lawsuit against Perplexity AI over alleged copyright infringement through RAG technology

2025

January 15, 2025: Google announces licensing partnership with Associated Press for AI training while OpenAI continues to face legal challenges
January 29, 2025: Part 2 of AI Report released, examining copyrightability of AI-generated works and emphasizing human authorship requirement
February 12, 2025: Copyright Office issues economic report examining implications of AI for copyright policy and markets
May 1, 2025: Federal Judge Vince Chhabria questions Meta's fair use defense in copyright case, suggesting AI training could "obliterate" markets for original works
May 2025: Part 3 of AI Report released, providing framework for determining when AI training constitutes fair use versus requiring licensing
May 2025: Political turmoil at Copyright Office as Register of Copyrights Shira Perlmutter is dismissed, raising questions about implementation of new guidelines

Expected Future Developments

Judicial decisions in major pending lawsuits between content creators and AI companies
Evolution of voluntary licensing markets for AI training content
Potential further guidance from Department of Justice on antitrust implications of collective licensing
Possible legislative action depending on effectiveness of current copyright framework
Publication of additional Copyright Office guidance on AI and copyright issues