Piracy lawsuit against Meta could set precedent for torrenting copyrighted works in AI training

Meta faces a lawsuit for using pirated books in AI training, possibly setting a precedent for copyright law and AI systems.

: In January 2024, writers sued Meta in California for using their copyrighted books to train the Llama large language model through the Book3 dataset. Meta admits to using torrenting, traditionally linked with piracy, to acquire these 195,000 books, arguing under the Fair Use doctrine. The plaintiffs claim this application of torrenting was deliberate and illegal, aiming for a partial summary judgment in the U.S. District Court. Judge Vince Chhabria, unfamiliar with torrenting, needs expert testimony before ruling, making whichever outcome a significant legal precedent.

The ongoing lawsuit against Meta could have significant implications on the legality of using copyrighted works in AI training. Filed in January 2024, a group of authors in California claims Meta used their copyrighted writings to enhance the capabilities of its Llama language model. By accessing the Book3 dataset, which includes a compilation of 195,000 books totaling 37GB, Meta acknowledges using this data but defends its actions under the Fair Use doctrine. This defense argues that the use of copyrighted material can be acceptable under certain conditions without the need for permission from the copyright holder.

Unsealed court documents indicate that Meta employed torrenting - a file-sharing method often linked with illegal activity - as a means to accelerate acquiring these data files. Traditionally, torrenting involves leeching and seeding, which refers respectively to downloading and uploading parts of files to facilitate faster sharing among users. The practice, when applied to copyrighted works, is commonly seen as copyright infringement. By using Amazon Web Services for its activities instead of its infrastructure, Meta purportedly attempted to mask its torrenting practices.

In March 2025, the authors requested a partial summary judgment in a U.S. District Court, a move indicating belief in the indisputable nature of Meta's copyright violations. They argue that the alleged torrenting practice made it a straightforward case of infringement, one that negates the fair-use defense presented by Meta. With these advancements, the authors are pushing for an immediate legal decision without trial, as they believe the evidence overwhelmingly supports their claims.

Judge Vince Chhabria, while overseeing the case, admitted to having little understanding of torrenting and related practices, using terms such as leeching and seeding. This admission may lead him to seek expert testimony in court to fully grasp the complexities involved. Such testimony would help evaluate whether Meta's actions align with fair use or constitute blatant piracy of intellectual property, an aspect central to the case's outcome.

Regardless of the verdict, the case could set a landmark legal precedent for similar future lawsuits involving AI systems and copyright infringement. An outcome favoring Meta could effectively sanction the use of copyrighted works in AI training without compensating creators, influencing current copyright laws akin to those in the Digital Millennium Copyright Act. Conversely, an outcome supporting the authors could affirm protections over creative works in the digital era, potentially reinforcing stricter regulatory frameworks around AI data acquisition.

Sources: TechSpot, Ars Technica