Mon. Dec 23rd, 2024
Openai's Video Generation Ai Is 'doomed To Failure', Says Meta's

Harsh words from one of the godfathers of AI.

incomplete pixel

Sora, OpenAI’s new AI model for video generation, has been making waves since its release last week. But Yann LeCun, chief AI scientist at Meta, believes that the much-hyped text-to-video model is much more than that.

In particular, LeCun takes issue with OpenAI. Claim It is believed that the collaboration with Sora will eventually enable the construction of a “general-purpose simulator of the physical world.” If that’s the case, LeCun argues, the company’s approach to creating a “world simulator” is completely wrong.

“Modeling the world for behavior by generating pixels is as wasteful and doomed to failure as the largely abandoned idea of ​​’analysis by synthesis,'” he wrote in the paper. . Post to Xformerly Twitter.

generation complication

LeCun is one of the so-called godfathers of AI and perhaps its most outspoken and outspoken person. While the other two godfathers lamented what they had unleashed, LeCun has continued to push forward with his work in the meta, never afraid to criticize his competitors.

His comments here refer to the long-standing debate between generative and discriminative models in machine learning. LeCun believes the former approach generates pixels. ”From latent explanatory variables” is too inefficient to adequately address the uncertainties arising from these complex predictions in 3D space.

In layman’s terms, he argues that these models try to “guess” a lot of unrelated details. It’s like trying to understand how each material in a soccer ball interacts in order to calculate its trajectory. It just focuses on things like mass and velocity.

“If your goal is to actually generate video, there’s nothing wrong with that,” he said in the paper. reply to his post. “But if the goal is to understand how the world works, that’s a failed proposition.”

Alternative proposal

LeCun acknowledges that generative approaches have generally worked so far for large language models like ChatGPT “because the text is discrete with a finite number of symbols.” But if you try to simulate the world as Sora envisions it, you’re dealing with more than just a few characters.

To compete with OpenAI’s approach, LeCun is developing his own model at Meta called Video Joint Embedding Predictive Architecture (V-JEPA), which was announced last week.

“This is different from a generative approach that tries to fill in all the missing pixels,” Mehta argues in the paper. blog post“V-JEPA has the flexibility to discard unpredictable information, increasing training and sample efficiency by 1.5x to 6x.”

LeCun’s work may not get the same hype as OpenAI’s products by producing flashy images and text, but it’s important to note that such prominent AI researchers It’s interesting to see a divergence from the same old approach now being developed by imitators.

Learn more about AI: ChatGPT seems to have lost his mind last night