AI Tools Banned from Using Penguin Random House Books

Penguin Random House (PRH) the world’s largest trade publisher, has updated its copyright rules globally, making it clear that AI systems are not allowed to use its books for training purposes. This shift affects all new and reprinted titles across its many publishing arms, aiming to prevent any part of their content from feeding artificial intelligence models.

The statement, which now appears in PRH’s imprints worldwide, says none of its books can be used or reproduced in any manner for training AI:

“No part of this book may be used or reproduced in any manner for the purpose of training artificial intelligence technologies or systems. In accordance with Article 4(3) of the DSM Directive 2019/790, Penguin Random House expressly reserves this work from the text and data mining exception.“

Penguin Random House AI data mining training clause AI Tools Banned from Using Penguin Random House Books

This ban is part of a larger push to stop tech companies from scraping creative works to build or refine language models without permission. By including this clause in its copyright notices, PRH wants to cut off unauthorized access to its materials for AI developers.

Exclusion from EU’s Data Mining Exception

PRH also made sure its books are excluded from a recent European Union directive that allows for text and data mining of legally accessible digital works. According to EU law, digital content can be mined for trends and patterns, but publishers can opt out—PRH has done just that.

Tom Weldon, CEO of PRH UK, explained to employees in a recent statement that the company would defend the rights of its authors vigorously. The protection of intellectual property is at the forefront of their strategy, though he also mentioned that PRH itself will still explore AI tools internally when it fits their goals.

This comes at a time when various copyright infringement lawsuits are being filed in the U.S., with authors and publishers accusing tech companies of using pirated books to train AI. Some of the largest academic publishers, including Taylor & Francis and Wiley, have begun licensing deals with AI firms to allow controlled use of their materials.

Penguin Random House, however, is the first major English-language trade publisher to take such a step against AI exploitation. So far, other big names in the industry, like Pan Macmillan and Simon & Schuster, which Meta considered buying for training its AI models, haven’t commented on whether they will follow suit, leaving PRH to lead the charge for now.

The Authors’ Licensing and Collecting Society (ALCS) in the UK welcomed the new PRH policy, saying it sets a much-needed precedent for protecting copyright in the digital age. Their CEO, Barbara Hayes, described it as a “positive move” and expressed hope that others would adopt similar practices. She also pointed out that this helps clarify what can and cannot be done with copyrighted materials, especially when it comes to training AI systems.

Further Legal Implications for AI

But not everyone believes this change will be enough. The UK´s Society of Authors (SoA) has praised the updated copyright wording, but it also called for stronger measures. According to SoA’s CEO, Anna Ganley, the policy should extend to contracts with authors, guaranteeing that their work won’t be used in AI-related activities like narration or translation without explicit consent.

The true legal issue lies in how AI models are trained, not necessarily in the output they produce. While it’s rare for AI-generated content to directly infringe on an author’s work, the training process itself may violate copyright, and this is where publishers need to pay attention.

Publishers should get a better grasp on the various legal tools they can use to stop unauthorized training of AI with their content. Some platforms already offer ways to opt out of such training, and new licenses are being developed to prevent AI scraping.

Despite PRH’s assertive action, other major players in the publishing world have yet to make similar moves. Faber, for example, recently implemented an internal AI policy that prevents freelancers working on its titles from copying any text into AI programs for editorial purposes. But when approached by media outlets, companies like Pan Macmillan and Simon & Schuster declined to comment on the AI challenge.

Media Outlets Push Back Against AI: A Growing Trend

The tension between AI companies and traditional media continues to escalate. In October 2024, Perplexity AI, a startup backed by Jeff Bezos, found itself in legal hot water with The New York Times. The newspaper accuses the company of copyright infringement, claiming that Perplexity had been accessing and summarizing its articles without proper authorization. A cease and desist letter was sent, alleging that the startup had been unjustly enriched by using the paper’s content without a license. This legal battle is just one of many similar confrontations Perplexity has faced—Forbes, Wired, and others had previously made their own accusations.

Perplexity AI has responded to these disputes by introducing a revenue-sharing initiative for content creators and establishing partnerships with several publishers. The company maintains that it doesn’t scrape data for AI model training, claiming instead to index web pages for citation purposes. In a move to smooth relations, it has expressed a willingness to collaborate with media outlets to address their concerns.

These legal battles aren’t isolated incidents. In an effort to navigate the shifting relationship between content creators and AI companies, media outlets are striking deals with tech firms. For instance, TIME has entered a multi-year partnership with OpenAI, granting the tech company access to its archive. Similar agreements have been signed between OpenAI and major publishers like The Atlantic, Vox Media, News Corp, and the Financial Times. These partnerships are seen as a way for publishers to regain control over their content while ensuring proper compensation in the age of AI-driven information processing.

Condé Nast has also made its move in this space, signing a long-term agreement with OpenAI that allows the tech giant to access content from its top publications, including The New Yorker, Vogue, Vanity Fair, Bon Appétit, and WIRED. This deal permits OpenAI to incorporate material from these renowned titles into its ChatGPT and the experimental SearchGPT.

Condé Nast’s CEO Roger Lynch stressed the need for publishers to adapt to new technological changes while ensuring that proper attribution and compensation for content remain central. Lynch explained to internal stakeholders that these agreements help recover some of the revenue lost due to shifting search engine functionalities. He praised OpenAI for its willingness to collaborate and ensure the integrity of information on its platforms, positioning the partnership as a way to support continued investment in high-quality journalism and creative endeavors.

AI Tools Banned from Using Penguin Random House Books

Exclusion from EU’s Data Mining Exception

Further Legal Implications for AI

Media Outlets Push Back Against AI: A Growing Trend

DataPelago Unveils Universal Engine to Unite Big Data, Advanced Analytics, HPC, and AI Workloads

Text Labeling and Image Resolution with the Monkey Chat Vision Model and DigitalOcean+Paperspace GPUs 🐒

New Reward Model Helps Improve LLM Alignment with Human Preferences

Perplexity lets you search your internal enterprise files and the web

New AI-Powered 3D Printing Can Help Surgeons Rehearse Procedures

Related articles

Google Expands NotebookLM with Business Pilot and AI Audio Customization

Amazon exec tells employees to work elsewhere if they dislike RTO policy

Kioxia Demonstrates RAID Offload Scheme for NVMe Drives

NVIDIA CEO Jensen Huang to Spotlight Innovation at India’s AI Summit