Unraveling the mysteries of mixtral experts. Mistral AI’s Open Source Mixtral 8x7B… | Written by Samuel Flender

Mistral AI’s open source Mixtral 8x7B model has generated a lot of buzz – here’s what’s inside

Mistral AI’s open source Mixtral 8x7B model has generated a lot of buzz – here’s what’s inside Sparse MOE in LLM: A brief history

Mistral AI’s new Sparse Mixing Expert LLM, Mixtral 8x7B, has recently made waves with dramatic headlines like “Mistral AI Introduces Mixtral 8x7B: Sparse Mixing Expert (SMoE) Language Model.” Transform machine learning” or “Mistral AI’s Mixtral 8x7B exceeds GPT-3.5; Shaking up the world of AI”

Mistral AI is a French AI startup founded in 2023 by Meta and former Google engineers. The company simply dumped a torrent magnet link on his Twitter account on December 8, 2023, when he released Mixtral 8x7B as perhaps the most unscrupulous release in LLM history.

scatter many sparks meme About Mistral’s unconventional model release method.

“mix of experts” (Jiang et al 2024), the accompanying research paper was published on Arxiv about a month later, on January 8 of this year. Let’s see if the hype is justified.

(Spoiler alert: Under the hood, there’s not much new technically.)

First, a little history for context.

Sparse MOE in LLM: A brief history

Mixed Experts (MoE) Model Dating back to research in the early 1990s (Jacobs et al 1991). The idea is to model the prediction y using a weighted sum of experts E. The weights are determined by the gating network G. This is a method of breaking down a large, complex problem into separate smaller sub-problems. Divide and conquer if you have to. For example, in the original study, the authors showed how different experts learn to specialize at different judgment boundaries in a vowel discrimination problem.

But what really made MoE a success was top-k routing, an idea first introduced in a 2017 paper.Extremely large neural network” (Shazeer et al. 2017). The key idea is to compute the output of only the top k experts rather than all experts. This allows FLOP to remain constant even if:

Unraveling the mysteries of mixtral experts. Mistral AI’s Open Source Mixtral 8x7B… | Written by Samuel Flender | March 2024

Byautomateinsider

Mistral AI’s open source Mixtral 8x7B model has generated a lot of buzz – here’s what’s inside

Sparse MOE in LLM: A brief history

By automateinsider

Related Post

Researchers from ETH Zurich and the University of California, Berkeley introduce MaxInfoRL: a new reinforcement learning framework for balancing endogenous and extrinsic exploration – MarkTechPost

Absci Bio releases IgDesign: A deep learning approach to transform antibody design with reverse folding – MarkTechPost

Excellence in Artificial Intelligence/Machine Learning: Olga Czabaj-Shetty, Bank of America – Markets Media

Introducing AI for customer service

You missed

Researchers from ETH Zurich and the University of California, Berkeley introduce MaxInfoRL: a new reinforcement learning framework for balancing endogenous and extrinsic exploration – MarkTechPost

4 ways artificial intelligence will reveal the unexpected in 2024 – CNN

Andrew Ng is betting big on agent AI – Fast Company

Absci Bio releases IgDesign: A deep learning approach to transform antibody design with reverse folding – MarkTechPost

Automate insider