Authors Sue Meta, Microsoft, and Others Over Unauthorized Use of Their Work in AI Models
2 min readFormer Arkansas Governor Mike Huckabee, along with other authors such as Christian writer Lysa TerKeurst, has filed a lawsuit against Meta, Microsoft, and several other companies over the unauthorized use of their books in training AI models. The proposed class action suit follows a recent trend of authors accusing tech companies of using their work without permission to train generative AI models. Notable authors, including George R.R. Martin, Jodi Picoult, and Michael Chabon, have also sued OpenAI for copyright infringement.
The focus of the Huckabee case revolves around a contentious dataset called “Books3,” which consists of over 180,000 works used to train large language models. EleutherAI, an artificial intelligence research group, along with Bloomberg, is also named in the lawsuit for their involvement. In August, The Atlantic published a searchable database that included all the titles and author information from Books3. This dataset is part of a larger collection called the Pile, created by EleutherAI, which the lawsuit claims was utilized by companies to train their products.
According to the lawsuit, both Meta and Microsoft incorporated copyrighted materials from Books3 into their language models’ training without compensating the authors. While Microsoft declined to comment on the matter, Meta, Bloomberg, and EleutherAI have not responded to requests for comment.
The case highlights the reliance of AI companies on vast amounts of public data to train their models, which goes beyond books and includes photographs, art, music, and more. The accessibility of AI tools, like ChatGPT or Stable Diffusion, has sparked debates and led to numerous legal actions regarding the compensation of individuals who contribute to these datasets. In a similar vein, in January, Getty Images sued the creators of the AI art tool Stable Diffusion, accusing them of unlawfully copying millions of copyrighted images for training their models.
As the landscape of AI and machine learning continues to evolve, issues surrounding copyright infringement and fair compensation for content creators are likely to persist. The outcome of lawsuits like this could set important precedents that shape the way tech companies use and compensate contributors in their AI models moving forward.