Copyrighted Books Fair Game for Training Meta and Anthropic Large Language Models

Two landmark copyright lawsuits have wrapped up with wins for tech giants Meta and Anthropic, and their respective large language models (LLMs), Llama and Claude.

At stake was the companies’ use of published, copyrighted books to train their LLMs. The plaintiffs in each case, authors of some of the books used this way, alleged that Meta and Anthropic had violated copyright law, to the detriment of the authors and, potentially, the market for human-created content.  

Anthropic’s case focused on copyrighted books the firm purchased and digitized to create a vast, searchable library. From that library, Anthropic selected and used various texts to train its LLM. In a June 23, 2025 ruling, US District Judge William Alsup wrote that both acts — digitizing works to be kept in a library, but not selling or sharing them, and using said works to train the LLM — qualified as “fair uses” under the Copyright Act. 

More specifically, by scanning physical books into its system, Anthropic was simply changing their format, and neither adding nor subtracting to the original books. Training LLMs on the books was transformative and protected as fair use.

According to Judge Alsup, the Copyright Act “reserves to the copyright owner the right to make derivative works that add or subtract creative material — as occurs in a ‘translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, [or] condensation’ of a book.”

Quoted only in a passing reference to “transformative” work on original content, the treatment of translation with regard to LLMs is not clear. It seems that the copyright owner of a book — whether an author or publisher — would retain control over, and thus need to sign off on, possible LLM-generated translations of the text. However, if an AI firm purchased an existing translation of said book and scanned it into the system, it appears that this act would be considered fair use.

Judge Alsup argued that there was no evidence that Anthropic’s LLM would reproduce, for public consumption, “a given work’s creative elements, nor even one author’s identifiable expressive style.” 

“Yes, Claude has outputted grammar, composition, and style that the underlying LLM distilled from thousands of works,” Judge Alsup wrote. “But if someone were to read all the modern-day classics because of their exceptional expression, memorize them, and then emulate a blend of their best writing, would that violate the Copyright Act? Of course not.”

The Trump administration’s “AI Czar”, David Sacks, echoed this on the All In Podcast on Jun 28, 2025. “So if an Al model violates someone’s copyright by outputting something that’s identical, then obviously that’s a violation. But if all they’re doing is transforming the work, they’re doing positional encoding, and then coming up with their own unique work product, this judge said that that is not a violation of copyright. I completely agree with that.”

In a different ruling by federal Judge Vincent Chhabria’s June 25, 2025 ruling also cited the limits of Meta’s Llama in reproducing original content used to train the LLM. He contradicted the claims of 13 plaintiffs, all authors, that Llama was trained to “regurgitate” their works and ultimately “create books that compete” with those written by human authors.  

“An LLM’s consumption of a book is different than a person’s,” wrote Judge Chhabria, describing LLMs as “innovative tools that can be used to generate diverse text and perform a wide range of functions,” including “translate an excerpt from or into a foreign language.”

Although Meta reportedly trained Llama on a dataset of about 200,000 books, the LLM “will not produce more than 50 words in the authors’ books,” a finding that supported the assertion that Meta’s use of the books was highly transformative. Based on this limitation, it stands to reason that Llama would not translate an entire book on demand. 

“[A]t most, this evidence shows that Meta wanted Llama to be able to generate text in certain styles. But style is not copyrightable—only expression is,” Judge Chhabria concluded.