Meta’s Vanilla Maverick AI Model ranks under rivals in the popular chat benchmark

Earlier this week, Meta landed in warm water to achieve a high score on the crowdsourced benchmark LM Arena using an experimental, unpublished version of the Llama 4 Maverick model. Incident I apologized to the LM Arena maintainer.change the policy and get the unchanged vanilla maverick.

After all, it’s not very competitive.

Unchanged Maverick, “llama-4-maverick-17b-128e-instruct” Ranked below model Includes Openai’s GPT-4O, Anthropic’s Claude 3.5 Sonnet, and Google’s Gemini 1.5 Pro on Friday. Many of these models were a few months ago.

The release version of Llama 4 was added to Lmarena after it turns out they’ve been fooled, but you probably didn’t see it as you’ll have to scroll to 32nd place. pic.twitter.com/a0bxkdx4lx

-ρ:eeσn (@pigeon__s) April 11, 2025

Why is the performance poor? Meta’s experimental Maverick, Lama-4-Maverick-03-26-experience, was “optimized for conversation,” the company explained. Published charts Last Saturday. These optimizations clearly worked well for LM arenas where human evaluators compare the outputs of the models and select what they like.

As I wrote before, for a variety of reasons, LM arena was not the most reliable measure of AI models’ performance. Still, tuning your model to your benchmark is not only misleading, but it also makes it difficult for developers to accurately predict how well a model will work in different contexts.

In a statement, a Meta spokesperson told TechCrunch that Meta will experiment with “all kinds of custom variants.”

“‘llama-4-maverick-03-26-Experimmal’ is a chat-optimized version that also works well in the LM arena,” the spokesman said. “We are currently releasing an open source version and see how developers can customize Llama 4 for their use cases. We look forward to seeing what they build and ongoing feedback.”

Meta’s Vanilla Maverick AI Model ranks under rivals in the popular chat benchmark

Byautomateinsider

By automateinsider

Related Post

Grok gets canvas-like tools for creating documents and apps

Accessing future AI models in Openai’s APIs may require a validated ID

Elon Musk’s AI Company, Xai launches Grok 3 API

Leave a Reply Cancel reply

Introducing AI for customer service

You missed

Russian seed chatbots lie. Bad actors can play AI in the same way. – Washington Post

WiseKey, Sealsq, Oiste Foundation and the United Nations Alliance of Civilizations unite to launch Human-Ai-T, a global initiative to embed humanity in #OneHumanity’s artificial intelligence UNAOC AI: Human-centered artificial intelligence – Trading view

Best Machine Learning Engineer Technology Interviews – Globenewswire

Grok gets canvas-like tools for creating documents and apps

Automate insider