“While there is nothing inherently wrong with using books as part of a dataset, using pirated (or stolen) books does not require fair compensation to the author or publisher for their work. “This is not true,” said the plaintiffs, including Mr. Huckabee, and Christian authors and podcasters, including Tsh Oxenreider and Lysa. TerKeurst said in his lawsuit. The lawsuit targets Meta, Microsoft, and financial data provider Bloomberg LP, all of which use data from the web to power their own “large-scale language models” (tools like ChatGPT). training a huge algorithm.
The lawsuit focuses on a notorious collection of pirated books known as “books3.” Plaintiffs allege that this was included in the “pile.” This collection is a collection of freely available data sources compiled by the non-profit organization EleutherAI to give small businesses access to more data. Train your own AI. The lawsuit also names EleutherAI as a defendant. The lawsuit, proposed as a class action, seeks damages and an injunction barring the companies from continuing to use the copyrighted material.
A Microsoft spokesperson declined to comment. Spokespeople for Meta, Bloomberg and EleutherAI did not respond to requests for comment.
Large-scale language models are typically trained on billions of sentences of text from the Internet, such as news articles, Wikipedia, and comments on social media sites. OpenAI and other AI companies such as Google and Microsoft do not say specifically what data they use, but AI critics say it includes data such as: I have long suspected that this is the case. collection of pirated books.
The battle over whether companies can pull data from the internet without payment or permission to train potentially profitable AI models will only intensify. Multiple lawsuits from comedians, writers, and artists target technology companies. Technology executives argue that retrieving data from the public web constitutes “free use.” This is a concept in copyright law that provides an exception when the work is materially different from the source material from which it is derived.