
(FM-Mistral) Mixtral of Expert
カートのアイテムが多すぎます
カートに追加できませんでした。
ウィッシュリストに追加できませんでした。
ほしい物リストの削除に失敗しました。
ポッドキャストのフォローに失敗しました
ポッドキャストのフォロー解除に失敗しました
-
ナレーター:
-
著者:
このコンテンツについて
Welcome to the FM Series! In this episode, we delve into Mixtral 8x7B, a fascinating new Sparse Mixture of Experts (SMoE) language model presented in a paper on arXiv. This model introduces a novel architecture, built upon the Mistral 7B design, but incorporating eight feedforward blocks, or "experts," in each layer. The key innovation lies in a router network that dynamically selects just two experts for every token at each layer. This means while the model encompasses 47 billion parameters in total, it only activates a much smaller 13 billion parameters during inference, making it efficient. The sources highlight its impressive performance, showing Mixtral outperforms or matches major models like Llama 2 70B and GPT-3.5 across various benchmarks. It particularly excels in mathematics, code generation, and multilingual tasks. An instruction-tuned version, Mixtral 8x7B - Instruct, even surpasses models like GPT-3.5 Turbo and Gemini Pro on human benchmarks. The source material does not explicitly mention any limitations of the model. Both the base and instruct models are available under the Apache 2.0 license. Join us to learn more about this significant development!
Paper: Link