Mon. Dec 23rd, 2024
Openai's Deals With Publishers Could Cause Problems For Rivals

OpenAI’s legal battle with the New York Times over data used to train AI models may still be ongoing. However, OpenAI is in the process of signing deals with other publishers, including some of the largest news publishers in France and Spain.

OpenAI on Wednesday announced Announced an agreement with Le Monde and Prisa Media to provide news content in French and Spanish to OpenAI’s ChatGPT chatbot. OpenAI said in a blog post that the partnership will bring the organization’s latest event coverage from brands like El País, Cinco Días, As, and El Huffpost in front of ChatGPT users wherever it makes sense, and that OpenAI’s previous He said he would contribute to the project. – The amount of training data is expanding.

OpenAI writes:

In the coming months, ChatGPT users will be able to interact with relevant news content from these publishers through selected summaries with attribution and enhanced links to the original articles, allowing users to Gain access to additional information and related articles. We continually improve ChatGPT to support the news industry’s important role in delivering real-time, trusted information to users.

Therefore, the licensing agreement OpenAI has disclosed is with a small number of content providers at this time. I felt that now was a good time to think about myself.

  • Stock media library Shutterstock (for images, videos, and music training data)
  • Associated Press
  • Axel Springer (owner of Politico, Business Insider, etc.)
  • Le Monde
  • prisa media

How much does OpenAI pay each? Well, at least not publicly. But we can make an estimate.

information report In January, it was announced that OpenAI is offering publishers between $1 million and $5 million annually for access to its archives to train their GenAI models. This doesn’t tell us much about the partnership with Shutterstock. But on the article licensing side, assuming The Information’s reporting is accurate and the numbers haven’t changed since then, OpenAI is paying between $4 million and $20 million a year for news.

It may be a small amount for OpenAI, but its war chest is more than $11 billion, and annual revenue recently topped $2 billion (around Financial Times). But as Homebrew partner and Screendoor co-founder Hunter Walk recently mused, this has the potential to overtake AI rivals that are also pursuing licensing deals.

walk write On his blog:

[I]If experiments are gated by nine-figure licensing agreements, we’re doing a disservice to innovation…for challengers because checks on “ownership” of training data are cut. There are significant barriers to entry. If Google, OpenAI, and other big tech companies can set costs high enough, they are implicitly deterring future competition.

Now, it’s debatable whether there are any barriers to entry today. Many, if not most, AI vendors have chosen not to license the data they use to train their AI models, incurring the ire of IP holders. For example, Midjourney, an art generation platform, training Regarding still images from Disney movies — and Midjourney has no contract with Disney.

A more difficult question to address is whether licensing should simply be a cost of doing business and experimentation in the AI ​​field.

Mr. Wolk would argue that this is not the case. He advocates for a regulator-imposed “safe harbor” that protects all AI vendors, not just small startups and researchers, from legal liability as long as they adhere to certain transparency and ethical standards. There is.

Interestingly, recently in the UK I’ve tried Codify something along these lines and exempt the use of text and data mining for AI training from copyright consideration as long as it is for research purposes. However, those efforts ultimately failed.

Personally, I’m not sure I’d go that far with Wolk’s “safe harbor” proposal, given the impact AI could have on an already destabilized news industry. The Atlantic’s latest model found If search engines like Google integrated AI into search, they would be able to answer user queries 75% of the time without requiring a click-through to a website.

But perhaps there is teeth Space for carve out.

Publishers should be compensated, and they should be paid fairly. But won’t the outcome be that they are paid and existing AI researchers and academics have access to the same data? like them Incumbent? I’m sure you think so. Grants are one-way. Larger VC checks are another.

It’s hard to say there’s a solution, especially given that courts have yet to decide whether and to what extent fair use protects AI vendors from claims of copyright infringement. However, it is important to clarify these things. Otherwise, the industry could end up in an unending academic “brain drain” where only a few powerful companies have access to a vast pool of valuable training sets.