(FM-Mistral) Mixtral of Expert

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

(FM-Mistral) Mixtral of Expert

無料で聴く

ポッドキャストの詳細を見る

このコンテンツについて

Welcome to the FM Series! In this episode, we delve into Mixtral 8x7B, a fascinating new Sparse Mixture of Experts (SMoE) language model presented in a paper on arXiv. This model introduces a novel architecture, built upon the Mistral 7B design, but incorporating eight feedforward blocks, or "experts," in each layer. The key innovation lies in a router network that dynamically selects just two experts for every token at each layer. This means while the model encompasses 47 billion parameters in total, it only activates a much smaller 13 billion parameters during inference, making it efficient. The sources highlight its impressive performance, showing Mixtral outperforms or matches major models like Llama 2 70B and GPT-3.5 across various benchmarks. It particularly excels in mathematics, code generation, and multilingual tasks. An instruction-tuned version, Mixtral 8x7B - Instruct, even surpasses models like GPT-3.5 Turbo and Gemini Pro on human benchmarks. The source material does not explicitly mention any limitations of the model. Both the base and instruct models are available under the Apache 2.0 license. Join us to learn more about this significant development!

Paper: Link