The U.S. Copyright Office’s third report on AI copyright, published in May 2025, stands as the most authoritative and detailed federal guidance to date on how U.S. copyright law should apply to AI training and outputs. At the heart of the report are sweeping, practical insights for technology companies, creators, attorneys, and policymakers, with a focus on the doctrine of fair use and its application to generative AI.

Background and Significance

Over the last several years, one of the most contentious debates in technology and copyright law has centered on whether using copyrighted works to train AI models is legally permissible. Copyright holders argue that without licensing, tech giants unfairly profit from works that are the product of painstaking human creativity. AI developers assert that limiting access will impede innovation and prevent AI from reaching its full potential. The third report, released as a pre-publication version on the Copyright Office’s official website dissects the complexities of the fair use doctrine as applied to AI copyright disputes.

Is AI Training Fair Use?

Before exploring the Copyright Office’s detailed factor-by-factor fair use analysis, it is important to address the question of whether fair use in AI training is applicable. The May 2025 report frames the fair use debate by acknowledging that AI system are trained by ingesting massive quantities of data and creative works, often without the direct knowledge, permission or participation of the copyright holders. The report clarifies that there is no categorical answer. According to the report, the fair use analysis in the AI context is assessed case by case, focusing on purpose, market impact, and the technical realities of how information is processed.

The Office emphasizes two pivotal factors that will likely carry the most weight as courts address future disputes:

Whether the AI’s use “transforms” the original protected works in both character and market role, rather than merely repackaging content for competing commercial purposes.
The risk of market substitution or dilution, including evidence that AI outputs could replace or erode demand for the works on which they were trained.

In practical terms, the Office signals that developers, creators, and attorneys must weigh the purpose behind AI training alongside specific market effects. Outputs that target the same users or undermine sales and licensing value will generally fall outside accepted fair use bounds. By contrast, uses that serve distinct societal functions, such as analytic research or summarization without generating competing expressive content, may be more likely to be considered Fair Use.

As the legal landscape evolves, voluntarily negotiated licensing arrangements and robust technical safeguards are expected to play an expanding role in managing both risk and opportunity. The Copyright Office’s nuanced approach is designed to encourage market-driven solutions while creating a flexible but principled foundation for future court decisions.

AI Copyright Fair-Use Landscape

At the core of the Office’s analysis lies the four-factor test for fair use, codified in 17 U.S.C. § 107. The AI Copyright report devotes substantial attention to each factor, providing actionable examples that clarify when training an AI system might be permissible without licensing and where it is likely not.

1. Purpose and Character of the Use

The report makes it clear that commercial AI systems trained on copyrighted data to produce similar market-facing content, such as AI-generated music or text that competes directly with original content, are “at best, modestly transformative.” For example, an AI system that ingests thousands of popular romance novels to generate new romance literature sold on the same platforms is likely engaging in a use that the Copyright Office deems outside the scope of fair use. The purpose here is to serve the same market, drawing the Office to conclude that “making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets … goes beyond established fair use boundaries”.

Real-World Example:

A start-up uses well-known stock photographs to train an AI image generator that then creates commercial illustrations for advertising campaigns. According to the AI Copyright report, this use is commercial, targets the same market as the original, and therefore weighs against a finding of fair use without permission or a robust, case-specific transformative argument.

2. Nature of the Copyrighted Work

The nature of the work—whether highly creative or more factual—impacts the fair use assessment. The Office emphasizes that uses of highly creative and expressive works, such as paintings, poems, and music, receive less protection under fair use than uses of factual databases or directories. For instance, training a legal research AI on raw judicial opinions (which are in the public domain) stands on firmer ground than an AI trained on a corpus of cutting-edge fine art photography.

3. Amount and Substantiality of the Portion Used

AI models typically require massive datasets, necessitating copies of substantial—and sometimes entire—works. The AI copyright report takes note of this, cautioning that wholesale ingestion of creative works, especially for commercial gain, is more likely to infringe, as opposed to minimal or de minimis uses. The more an AI model relies on direct and substantial copying, the harder it is to claim fair use.

Example in Context:

A language model trained on the full archives of a renowned newspaper is more likely to run afoul of copyright than one trained solely on headlines or public domain stories.

4. Effect on the Market for the Original Work

The fourth factor is considered especially weighty by the Office. It emphasizes two critical forms of market injury:

Lost Sales/Substitution: If the AI’s outputs replace demand for the originals (e.g., AI-generated audiobooks replacing voice actors), the use likely undermines the core incentive of copyright law.
Market Dilution: Even if the outputs are not identical, the proliferation of similar AI products can depress demand and reduce royalties for the original creators—an effect the Office validates with real-world evidence from stock photo and music libraries now flooded with AI alternatives.

There is no categorical rule that AI training is either always permitted or always prohibited; fair use remains a case-by-case, fact-specific inquiry applying the four statutory factors. U.S. Copyright Office

The Office expressly observes that the first and fourth factors “can be expected to assume considerable weight,” while emphasizing courts must weigh all factors holistically. Importantly, in summarizing its view near the conclusion, the Office writes: “Making commercial use of vast troves of copyrighted works to produce expressive content that competes with them in existing markets, especially where this is accomplished through illegal access, goes beyond established fair use boundaries.”

The Transformative Use Test — Purpose and Character Matter

A central theme of the AI Copyright report is the nuanced concept of transformative use, which asks whether the AI’s training purpose carries a new or different character compared to the originals. The Office draws on recent Supreme Court precedent (Warhol v. Goldsmith) to argue that transformation is not mere technical difference. Transformation turns on purpose, not merely technical novelty or differences in output. Courts should ask what the use is for and whether it targets the same market as the original. If an AI system is trained to produce works that occupy the same market and appeal to the same audience as the ingested copyrighted works (for example, training an audio model on sound recordings to generate commercial music of the same kind), the use is, at best, modestly transformative and likely weighs against fair use when considered with the other factors. The Office notes that controls on outputs (for example, guardrails to reduce infringing, substitutive responses) can support a fair-use position, but any such evidence must be weighed alongside commerciality and market effects. The critical inquiry is whether the AI serves a fundamentally different market need or simply retools content for the same audience. However, drawing on Warhol v. Goldsmith, the AI copyright report reinforces that the transformative inquiry should not overshadow the rest of the statutory framework and should be applied as part of a holistic assessment.

Practical Case:

OpenAI’s ChatGPT, for example, is not deemed transformative merely because it can respond to questions like a “new” author; if it generates news articles similar to those in its training data and those compete for the same readers, its use is only “modestly transformative.” On the other hand, if AI supports scientific research, such as summarizing and cross-referencing biological data, courts may regard this as transformative and within the realm of fair use—provided market substitution risks are low.

Debunking the “AI Is Human Learning” Analogy

The AI copyright report soundly rejects the analogy that AI “learns” like humans for fair use purposes. While humans learn and synthesize information over time, AI models create countless perfect digital copies, analyze them at high speed, and encode them into machine-readable parameters that can reproduce stylistic content with uncanny fidelity. That scale—and the precision of copying—make a difference under copyright law. Statutory fair use never gave humans or machines carte blanche to duplicate and absorb entire protected works without limitsIt emphasizes the material differences: AI training entails making large numbers of perfect, rapid copies at scale, resulting in systems capable of generating content at superhuman speed and scale. Those differences are legally relevant; fair use does not excuse all learning—human or machine—without regard to the statutory factors.

Market Harm and the “Market Dilution” Theory

Under Factor Four, the report highlights multiple dimensions of market effects, including (1) lost sales from substitution, (2) market dilution where AI outputs—though not verbatim copies—compete with the same category of works used for training and depress demand and royalty pools, and (3) lost licensing opportunities. The market-dilution analysis is presented as a legitimate Factor-Four consideration grounded in copyright’s incentive structure. For example, AI models that generate custom textbooks or study guides have caused publishing companies to report declining revenues, correlating with the public availability of AI-written, low-cost alternatives. The “market dilution” theory, once controversial, is now validated as a legitimate, measurable risk recognized by courts.

Voluntary Licensing and Market Solutions

Rather than advocating for broad, compulsory licensing schemes, the Office encourages technology companies and rightsholders to expand voluntary licensing solutions. For example, some leading music publishers now offer collective licenses for model training, allowing AI developers to pay for access to curated datasets, reducing liability risk. In the AI copyright report, the Office views such models as scalable and adaptive to market needs, preferring organic market growth to direct statuory obligations.

Retrieval-Augmented Generation (RAG) – Special Focus

Retrieval-augmented generation (RAG)—where AI systems pull directly from databases or the open web to provide summarized content in response to user prompts—presents heightened risks. For instance, an AI that retrieves chunks of a New York Times article and displays them verbatim for a paid customer is unlikely to be considered transformative and probably runs afoul of both market harm and the expressive purpose test. In the AI Copyright report, the Office’s caution is practical: if users are satisfied with AI-generated summaries and cease subscribing to the originals, copyright law’s incentives are undercut.

Technical and Legal Guidance — A Pragmatic, Evolving Framework

The AI Copyright report offers an analytical framework and technical background, urging courts, creators, and companies to apply fair use carefully, with attention to all facts and context. There is no categorical permission or prohibition—every case will depend on specific markets, purposes, and technical designs. Developers are advised to implement meaningful guardrails, like filters to prevent direct replication of training data and ensure outputs do not substitute for protected works.

Criticisms and The Road Ahead

The Copyright Office’s endorsement of “market dilution” in their AI Copyright Report has drawn criticism from some AI proponents, who worry about potential chilling effects. However, the report sets no immediate legislative agenda; it simply advises Congress to watch the voluntary licensing market and intervene only if clear market failures emerge.

Practical Implications for Copyright Creators

AI developers should evaluate training purpose, outputs, and controls with an eye to reducing substitution and plan for licensing where outputs target the same markets as training works.
Copyright owners should monitor and, where appropriate, participate in voluntary and collective licensing channels and scrutinize RAG implementations that provide expressive summaries substituting for originals.
Attorneys should anchor advice in the factor-by-factor framework, document safeguards that limit substitution, and highlight that Factors 1 and 4 carry considerable weight within the holistic analysis.
Legislative outlook: AI Copyright Policy presently favors market-driven solutions, with alternatives considered if persistent market failures arise.

Conclusion

The third report delivers an authoritative, non-categorical roadmap: fair use for AI training turns on purpose, how much and why you copy, and what happens to the market, with voluntary licensing expanding to manage risk. It cautions that using “vast troves” of protected works to produce expressive outputs that compete in existing markets falls beyond established fair-use boundaries, while recognizing contexts (such as analysis or research) in which outputs are unlikely to substitute for expressive works.

Schedule Your Consultation Today

3rd AI Copyright Report: Reshaping the Future of Fair-Use