The New York Times wants OpenAI and Microsoft to pay for training data

The New York Times is suing OpenAI and its close collaborator (and investor) Microsoft for violating copyright law by training a generative AI model on Times content.

inside lawsuitThe lawsuit, filed in U.S. District Court in Manhattan, alleges that millions of articles were used without consent to train AI models, including those behind OpenAI’s wildly popular ChatGPT and Microsoft’s Copilot. It is claimed that The Times has called on OpenAI and Microsoft to “destroy” models and training data containing the material in question and to pay “billions of dollars in damages related to the illegal copying and use of the Times’ unique and valuable copyrighted works.” It seeks to be liable for “legal and actual damages” of $1. ”

“If the Times and other news organizations fail to create and protect independent journalism, there will be a vacuum that computers and artificial intelligence cannot fill,” the Times’ complaint says. “Journalism will be produced less and the cost to society will be enormous.”

An OpenAI spokesperson said in an emailed statement: “We respect the rights of content creators and owners and are committed to working with them to ensure they benefit from AI technology and new revenue models. Continuing with The New York Times We are surprised and disappointed by this development, as the dialogue has been productive and constructive. Like many other publishers, we hope that we can find a mutually beneficial way of working together. .”

Generative AI models “learn” to create essays, code, emails, articles, etc. from samples, and vendors like OpenAI collect millions to billions of samples from the web to create training sets. Add to. Some examples are in the public domain. Some do not, and others are under restrictive licenses that require citations or certain forms of compensation.

Vendors claim that the fair use doctrine comprehensively protects their web scraping activities. Copyright owners disagree. hundreds of of news organizations are now using code that prevents OpenAI, Google, and others from scanning their training data on their websites.

Conflicts between vendors and retailers have led to a growing number of legal battles, with the Times being the latest.

Actress Sarah Silverman joined two lawsuits in July accusing Meta and OpenAI of “ingesting” Silverman’s memoirs to train their AI models. In a separate lawsuit, thousands of novelists, including Jonathan Franzen and John Grisham, claim that OpenAI obtained their works as training data without their permission or knowledge. Several programmers are also suing Microsoft, OpenAI, and GitHub over Copilot, an AI-powered code generation tool that the plaintiffs claim uses code protected by intellectual property. It is claimed that it was developed by

The Times is not the first company to sue a generative AI vendor for alleged intellectual property infringement involving copyrighted material, but it is the largest publisher to date to have been involved in a lawsuit of this type, and claims that it has accused its company of “illusioning” It was one of the first publishers to point out the potential damage to its brand. or fabricated facts from generative AI models.

The Times’ complaint alleges that Microsoft’s Bing Chat (now called Copilot), which is based on an OpenAI model, posted false information purportedly from the Times, including results for the “15 most heart-healthy foods.” The following are examples of cases in which information was provided. ” Twelve of them were not mentioned in the Times article.

The Times also claims that OpenAI and Microsoft are effectively building competitors for news publishers using Times copyrighted material, including information that is typically not accessible without a subscription, meaning that it is not always cited. They claim that the Times is harming its business by providing false information. Additionally, affiliate links used by The Times to generate commissions may be monetized and removed.

As the Times complaint alludes to, generative AI models tend to regurgitate training data, reproducing results almost verbatim from, say, an article.Besides regurgitation, OpenAI has at least once inadvertently ChatGPT users can now bypass paywalled news content.

“Defendants seek to free ride on the Times’ massive investment in journalism,” the complaint states, adding that OpenAI and Microsoft “use the Times’ content for free to create alternative products that can be viewed from the Times.” “They are robbing people,” he said.

The impact on news subscription businesses and publishers’ web traffic is at the center of a similar lawsuit publishers filed against Google earlier this month. The lawsuit alleges that, like the Times, Google’s GenAI experiments, including the AI-powered Bard chatbot and Search Generative Experience, siphon content, readership and advertising revenue from publishers through anti-competitive means. he claimed.

The publisher’s claims are credible. The Atlantic’s latest model found This means that if search engines like Google integrated AI into search, they would be able to answer user queries 75% of the time without requiring a click-through to a website. Publishers suing Google estimate they could lose up to 40% of their traffic.

That doesn’t mean they will win in court. Heather Meeker, a founding partner at OSS Capital and an advisor on intellectual property matters, including licensing agreements, likened the Times regurgitation to “cutting and pasting in a word processor.”

“The New York Times cites in its complaint an example of a ChatGPT session about restaurant reviews in 2012,” Meeker told TechCrunch via email. “ChatGPT’s prompt is, ‘What was the opening paragraph of his review?’ Then the next prompt repeatedly asks for “next sentence”. Teasing a chatbot to reproduce your input is not a sensible basis for copyright infringement. If a user intentionally creates a copy of the chatbot, it is the user’s responsibility.And that’s mostly why [lawsuits like this] It will probably fail. ”

Some news organizations have chosen to enter into licensing agreements rather than fight generative AI vendors in court. Associated Press hit It signed a deal with OpenAI in July, and German publisher Axel Springer, which owns Politico and Business Insider, signed a similar deal this month.

The Times said in its complaint that it tried to reach licensing deals with Microsoft and OpenAI in April, but negotiations were ultimately fruitless.

Updated at 4:24 ET with additional context and comment from OpenAI.

The New York Times wants OpenAI and Microsoft to pay for training data

Byautomateinsider

By automateinsider

Related Post

Anthropic aims to fund a new generation of more comprehensive AI benchmarks

Gemini’s data analytics capabilities aren’t as good as Google claims

Hevia Raises Nearly $100 Million Series B for Andreessen Horowitz-Led AI-Powered Document Search

Introducing AI for customer service

You missed

4 ways artificial intelligence will reveal the unexpected in 2024 – CNN

Andrew Ng is betting big on agent AI – Fast Company

Absci Bio releases IgDesign: A deep learning approach to transform antibody design with reverse folding – MarkTechPost

Sam Altman disputes Marc Andreessen’s account of AI meeting with Biden administration – TechCrunch

Automate insider