-
サマリー
あらすじ・解説
In this episode of IA Odyssey, we unpack how DeepSeek's open-source models are shaking up the AI world—matching GPT-level performance at a fraction of the cost. Drawing on insights from the research paper by Chengen Wang (University of Texas at Dallas) and Murat Kantarcioglu (Virginia Tech), we explore DeepSeek's secret sauce: memory-efficient Multi-Head Latent Attention, an evolved Mixture of Experts architecture, and reinforcement learning without supervised data. Oh, and did we mention they trained this monster on a $ave-the-GPU budget?
From hardware-aware model design to the surprisingly powerful GRPO algorithm, this episode decodes the magic that’s making DeepSeek-V3 and R1 the open-source giants to watch. Whether you're an AI enthusiast or just want to know who's giving OpenAI and Anthropic sleepless nights, you don’t want to miss this.
Crafted with help from Google's NotebookLM.
Read the full paper here: https://arxiv.org/abs/2503.11486