エピソード

  • Episode 1: Introducing Vanishing Gradients
    2022/02/16
    In this brief introduction, Hugo introduces the rationale behind launching a new data science podcast and gets excited about his upcoming guests: Jeremy Howard, Rachael Tatman, and Heather Nolis! Original music, bleeps, and blops by local Sydney legend PlaneFace (https://planeface.bandcamp.com/album/fishing-from-an-asteroid)!
    続きを読む 一部表示
    5 分
  • Episode 54: Scaling AI: From Colab to Clusters — A Practitioner’s Guide to Distributed Training and Inference
    2025/07/18
    Colab is cozy. But production won’t fit on a single GPU. Zach Mueller leads Accelerate at Hugging Face and spends his days helping people go from solo scripts to scalable systems. In this episode, he joins me to demystify distributed training and inference — not just for research labs, but for any ML engineer trying to ship real software. We talk through: • From Colab to clusters: why scaling isn’t just about training massive models, but serving agents, handling load, and speeding up iteration • Zero-to-two GPUs: how to get started without Kubernetes, Slurm, or a PhD in networking • Scaling tradeoffs: when to care about interconnects, which infra bottlenecks actually matter, and how to avoid chasing performance ghosts • The GPU middle class: strategies for training and serving on a shoestring, with just a few cards or modest credits • Local experiments, global impact: why learning distributed systems—even just a little—can set you apart as an engineer If you’ve ever stared at a Hugging Face training script and wondered how to run it on something more than your laptop: this one’s for you. LINKS Zach on LinkedIn (https://www.linkedin.com/in/zachary-mueller-135257118/) Hugo's blog post on Stop Buliding AI Agents (https://www.linkedin.com/posts/hugo-bowne-anderson-045939a5_yesterday-i-posted-about-stop-building-ai-activity-7346942036752613376-b8-t/) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/stop-building-agents) 🎓 Learn more: Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — https://maven.com/s/course/d56067f338 Zach's course (45% off for VG listeners!): Scratch to Scale: Large-Scale Training in the Modern World (https://maven.com/walk-with-code/scratch-to-scale?promoCode=hugo39) -- https://maven.com/walk-with-code/scratch-to-scale?promoCode=hugo39 📺 Watch the video version on YouTube: YouTube link (https://youtube.com/live/76NAtzWZ25s?feature=share)
    続きを読む 一部表示
    41 分
  • Episode 53: Human-Seeded Evals & Self-Tuning Agents: Samuel Colvin on Shipping Reliable LLMs
    2025/07/08
    Demos are easy; durability is hard. Samuel Colvin has spent a decade building guardrails in Python (first with Pydantic, now with Logfire), and he’s convinced most LLM failures have nothing to do with the model itself. They appear where the data is fuzzy, the prompts drift, or no one bothered to measure real-world behavior. Samuel joins me to show how a sprinkle of engineering discipline keeps those failures from ever reaching users. We talk through: • Tiny labels, big leverage: how five thumbs-ups/thumbs-downs are enough for Logfire to build a rubric that scores every call in real time • Drift alarms, not dashboards: catching the moment your prompt or data shifts instead of reading charts after the fact • Prompt self-repair: a prototype agent that rewrites its own system prompt—and tells you when it still doesn’t have what it needs • The hidden cost curve: why the last 15 percent of reliability costs far more than the flashy 85 percent demo • Business-first metrics: shipping features that meet real goals instead of chasing another decimal point of “accuracy” If you’re past the proof-of-concept stage and staring down the “now it has to work” cliff, this episode is your climbing guide. LINKS Pydantic (https://pydantic.dev/) Logfire (https://pydantic.dev/logfire) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/stop-building-agents) 🎓 Learn more: Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — next cohort starts July 8: https://maven.com/s/course/d56067f338 📺 Watch the video version on YouTube: YouTube link (https://youtube.com/live/wk6rPZ6qJSY?feature=share)
    続きを読む 一部表示
    45 分
  • Episode 52: Why Most LLM Products Break at Retrieval (And How to Fix Them)
    2025/07/02
    Most LLM-powered features do not break at the model. They break at the context. So how do you retrieve the right information to get useful results, even under vague or messy user queries? In this episode, we hear from Eric Ma, who leads data science research in the Data Science and AI group at Moderna. He shares what it takes to move beyond toy demos and ship LLM features that actually help people do their jobs. We cover: • How to align retrieval with user intent and why cosine similarity is not the answer • How a dumb YAML-based system outperformed so-called smart retrieval pipelines • Why vague queries like “what is this all about” expose real weaknesses in most systems • When vibe checks are enough and when formal evaluation is worth the effort • How retrieval workflows can evolve alongside your product and user needs If you are building LLM-powered systems and care about how they work, not just whether they work, this one is for you. LINKS Eric's website (https://ericmjl.github.io/) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/stop-building-agents) 🎓 Learn more: Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — next cohort starts July 8: https://maven.com/s/course/d56067f338 📺 Watch the video version on YouTube: YouTube link (https://youtu.be/d-FaR5Ywd5k)
    続きを読む 一部表示
    29 分
  • Episode 51: Why We Built an MCP Server and What Broke First
    2025/06/26
    What does it take to actually ship LLM-powered features, and what breaks when you connect them to real production data? In this episode, we hear from Philip Carter — then a Principal PM at Honeycomb and now a Product Management Director at Salesforce. In early 2023, he helped build one of the first LLM-powered SaaS features to ship to real users. More recently, he and his team built a production-ready MCP server. We cover: • How to evaluate LLM systems using human-aligned judges • The spreadsheet-driven process behind shipping Honeycomb’s first LLM feature • The challenges of tool usage, prompt templates, and flaky model behavior • Where MCP shows promise, and where it breaks in the real world If you’re working on LLMs in production, this one’s for you! LINKS So We Shipped an AI Product: Did it Work? by Philip Carter (https://www.honeycomb.io/blog/we-shipped-ai-product) Vanishing Gradients YouTube Channel (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/ai-as-a-civilizational-technology) 🎓 Learn more: Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — next cohort starts July 8: https://maven.com/s/course/d56067f338 📺 Watch the video version on YouTube: YouTube link (https://youtu.be/JDMzdaZh9Ig)
    続きを読む 一部表示
    48 分
  • Episode 50: A Field Guide to Rapidly Improving AI Products -- With Hamel Husain
    2025/06/17
    If we want AI systems that actually work, we need to get much better at evaluating them, not just building more pipelines, agents, and frameworks. In this episode, Hugo talks with Hamel Hussain (ex-Airbnb, GitHub, DataRobot) about how teams can improve AI products by focusing on error analysis, data inspection, and systematic iteration. The conversation is based on Hamel’s blog post A Field Guide to Rapidly Improving AI Products, which he joined Hugo’s class to discuss. They cover: 🔍 Why most teams struggle to measure whether their systems are actually improving 📊 How error analysis helps you prioritize what to fix (and when to write evals) 🧮 Why evaluation isn’t just a metric — but a full development process ⚠️ Common mistakes when debugging LLM and agent systems 🛠️ How to think about the tradeoffs in adding more evals vs. fixing obvious issues 👥 Why enabling domain experts — not just engineers — can accelerate iteration If you’ve ever built an AI system and found yourself unsure how to make it better, this conversation is for you. LINKS * A Field Guide to Rapidly Improving AI Products by Hamel Husain (https://hamel.dev/blog/posts/field-guide/) * Vanishing Gradients YouTube Channel (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) * Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) * Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/ai-as-a-civilizational-technology) 🎓 Learn more: Hugo's course: Building LLM Applications for Data Scientists and Software Engineers (https://maven.com/s/course/d56067f338) — next cohort starts July 8: https://maven.com/s/course/d56067f338 Hamel & Shreya's course: AI Evals For Engineers & PMs (https://maven.com/parlance-labs/evals?promoCode=GOHUGORGOHOME) — use code GOHUGORGOHOME for $800 off 📺 Watch the video version on YouTube: YouTube link (https://youtu.be/rWToRi2_SeY)
    続きを読む 一部表示
    28 分
  • Episode 49: Why Data and AI Still Break at Scale (and What to Do About It)
    2025/06/05
    If we want AI systems that actually work in production, we need better infrastructure—not just better models. In this episode, Hugo talks with Akshay Agrawal (Marimo, ex-Google Brain, Netflix, Stanford) about why data and AI pipelines still break down at scale, and how we can fix the fundamentals: reproducibility, composability, and reliable execution. They discuss: 🔁 Why reactive execution matters—and how current tools fall short 🛠️ The design goals behind Marimo, a new kind of Python notebook ⚙️ The hidden costs of traditional workflows (and what breaks at scale) 📦 What it takes to build modular, maintainable AI apps 🧪 Why debugging LLM systems is so hard—and what better tooling looks like 🌍 What we can learn from decades of tools built for and by data practitioners Toward the end of the episode, Hugo and Akshay walk through two live demos: Hugo shares how he’s been using Marimo to prototype an app that extracts structured data from world leader bios, and Akshay shows how Marimo handles agentic workflows with memory and tool use—built entirely in a notebook. This episode is about tools, but it’s also about culture. If you’ve ever hit a wall with your current stack—or felt like your tools were working against you—this one’s for you. LINKS * marimo | a next-generation Python notebook (https://marimo.io/) * SciPy conference, 2025 (https://www.scipy2025.scipy.org/) * Hugo's face Marimo World Leader Face Embedding demo (https://www.youtube.com/watch?v=DO21QEcLOxM) * Vanishing Gradients YouTube Channel (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) * Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) * Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/ai-as-a-civilizational-technology) * Watch the podcast here on YouTube! (https://youtu.be/wU82fz4iRfo) 🎓 Want to go deeper? Check out Hugo's course: Building LLM Applications for Data Scientists and Software Engineers. Learn how to design, test, and deploy production-grade LLM systems — with observability, feedback loops, and structure built in. This isn’t about vibes or fragile agents. It’s about making LLMs reliable, testable, and actually useful. Includes over $800 in compute credits and guest lectures from experts at DeepMind, Moderna, and more. Cohort starts July 8 — Use this link for a 10% discount (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=LLM10)
    続きを読む 一部表示
    1 時間 22 分
  • Episode 48: HOW TO BENCHMARK AGI WITH GREG KAMRADT
    2025/05/23
    If we want to make progress toward AGI, we need a clear definition of intelligence—and a way to measure it. In this episode, Hugo talks with Greg Kamradt, President of the ARC Prize Foundation, about ARC-AGI: a benchmark built on Francois Chollet’s definition of intelligence as “the efficiency at which you learn new things.” Unlike most evals that focus on memorization or task completion, ARC is designed to measure generalization—and expose where today’s top models fall short. They discuss: 🧠 Why we still lack a shared definition of intelligence 🧪 How ARC tasks force models to learn novel skills at test time 📉 Why GPT-4-class models still underperform on ARC 🔎 The limits of traditional benchmarks like MMLU and Big-Bench ⚙️ What the OpenAI O₃ results reveal—and what they don’t 💡 Why generalization and efficiency, not raw capability, are key to AGI Greg also shares what he’s seeing in the wild: how startups and independent researchers are using ARC as a North Star, how benchmarks shape the frontier, and why the ARC team believes we’ll know we’ve reached AGI when humans can no longer write tasks that models can’t solve. This conversation is about evaluation—not hype. If you care about where AI is really headed, this one’s worth your time. LINKS * ARC Prize -- What is ARC-AGI? (https://arcprize.org/arc-agi) * On the Measure of Intelligence by François Chollet (https://arxiv.org/abs/1911.01547) * Greg Kamradt on Twitter (https://x.com/GregKamradt) * Hugo's High Signal Podcast with Fei-Fei Li (https://high-signal.delphina.ai/episode/fei-fei-on-how-human-centered-ai-actually-gets-built) * Vanishing Gradients YouTube Channel (https://www.youtube.com/channel/UC_NafIo-Ku2loOLrzm45ABA) * Upcoming Events on Luma (https://lu.ma/calendar/cal-8ImWFDQ3IEIxNWk) * Hugo's recent newsletter about upcoming events and more! (https://hugobowne.substack.com/p/ai-as-a-civilizational-technology) * Watch the podcast here on YouTube! (https://youtu.be/wU82fz4iRfo) 🎓 Want to go deeper? Check out Hugo's course: Building LLM Applications for Data Scientists and Software Engineers. Learn how to design, test, and deploy production-grade LLM systems — with observability, feedback loops, and structure built in. This isn’t about vibes or fragile agents. It’s about making LLMs reliable, testable, and actually useful. Includes over $800 in compute credits and guest lectures from experts at DeepMind, Moderna, and more. Cohort starts July 8 — Use this link for a 10% discount (https://maven.com/hugo-stefan/building-llm-apps-ds-and-swe-from-first-principles?promoCode=LLM10)
    続きを読む 一部表示
    1 時間 4 分