エピソード

  • (LLM Explain-Anthropic) On the Biology of a Large Language Model
    2025/06/01

    "On the Biology of a Large Language Model" from Anthropic presents a novel investigation into the internal mechanisms of Claude 3.5 Haiku using circuit tracing methodology. Analogous to biological research, this approach employs tools like attribution graphs to reverse engineer the model's computational steps. The research offers insights into diverse model capabilities, such as multi-step reasoning, planning in poems, multilingual circuits, addition, and medical diagnoses. It also examines mechanisms underlying hallucinations, refusals, jailbreaks, and hidden goals. This work aims to reveal interpretable intermediate computations, highlighting its potential in areas like safety auditing.

    However, the methods have significant limitations. They provide detailed insights for only a fraction of prompts, capture just a small part of the model's immense complexity, and rely on imperfect replacement models. They struggle with complex reasoning chains, long prompts, and explaining inactive features. A key challenge is understanding the causal role of attention patterns.

    Despite these limitations, this research represents a valuable stepping stone towards a deeper understanding of how large language models function internally and presents a challenging scientific frontier.

    Paper link: https://transformer-circuits.pub/2025/attribution-graphs/biology.html

    続きを読む 一部表示
    16 分
  • (LLM Security-Meta) LlamaFirewall: AI Agent Security Guardrail System
    2025/05/31

    Listen to this podcast to learn about LlamaFirewall, an innovative open-source security framework from Meta. As large language models evolve into autonomous agents capable of performing complex tasks like editing production code and orchestrating workflows, they introduce significant new security risks that existing measures don't fully address. LlamaFirewall is designed to serve as a real-time guardrail monitor, providing a final layer of defence against these risks for AI Agents.

    Its novelty stems from its system-level architecture and modular, layered design. It incorporates three powerful guardrails: PromptGuard 2, a universal jailbreak detector showing state-of-the-art performance; AlignmentCheck, an experimental chain-of-thought auditor inspecting reasoning for prompt injection and goal misalignment; and CodeShield, a fast and extensible online static analysis engine preventing insecure code generation. These guardrails are tailored to address emerging LLM agent security risks in applications like travel planning and coding, offering robust mitigation.

    However, CodeShield is not fully comprehensive and may miss nuanced vulnerabilities. AlignmentCheck requires large, capable models, which can be computationally costly, and faces the potential risk of guardrail injection. Meta is actively developing the framework, exploring future work like expanding to multimodal agents and improving latency. LlamaFirewall aims to provide a collaborative security foundation for the community.

    Learn more here

    続きを読む 一部表示
    17 分
  • (Open AI) PaperBench: Evaluating AI’s Ability to Replicate AI Research
    2025/05/31

    Dive into PaperBench, a novel benchmark introduced by OpenAI designed to rigorously evaluate AI agents' ability to replicate state-of-the-art machine learning research. Unlike previous benchmarks, PaperBench requires agents to build complete codebases from scratch based solely on the paper content, and successfully run experiments from 20 selected ICML papers. Performance is meticulously graded using detailed, author-approved rubrics containing thousands of specific outcomes. To facilitate scalable evaluation, the benchmark employs an LLM-based judge, assessed for its accuracy against human grading. Early results show that current models, like Claude 3.5 Sonnet, achieve average replication scores of around 21.0%, demonstrating emerging capabilities but not yet matching the performance of human ML PhDs. PaperBench serves as a crucial tool for measuring AI autonomy and ML R&D capabilities, potentially accelerating future scientific discovery. However, challenges remain, including the high computational cost of evaluations and the labour-intensive process of creating the comprehensive rubrics.

    Paper link: https://arxiv.org/pdf/2504.01848

    続きを読む 一部表示
    16 分
  • (RecSys-Spotify) Bridging Search and Recommendation in Generative Retrieval
    2025/05/30

    This podcast explores novel research from Spotify on unified generative models for information retrieval, specifically integrating search and recommendation. Moving beyond traditional index-based systems, this approach leverages large language models (LLMs) to directly predict item IDs, centralizing tasks like search and recommendation.

    The study investigates whether jointly training search and recommendation tasks in a single generative model improves effectiveness. Key hypotheses explored are [H1], regarding regularization of item popularity estimation, and [H2], focusing on regularization of item latent representations. Experiments using simulated and real-world data show the joint model is generally more effective than task-specific models, with an average increase of 16% in R@30 on real datasets, primarily due to latent representation regularization ([H2]).

    Applications for this technology span platforms like Spotify, YouTube, and Netflix. However, generative retrieval still faces scalability challenges with large item sets. Furthermore, effectiveness gains depend on factors like popularity distribution alignment and item co-occurrence patterns across tasks. This research represents a significant stride towards developing unified LLMs for diverse IR functions.

    Paper: https://arxiv.org/pdf/2410.16823

    続きを読む 一部表示
    14 分
  • (LLM-Spotify) PODTILE: Podcast Auto-generated Chapters
    2025/05/30

    Listeners of long-form talk-audio content, like podcasts, often find it challenging to understand structure and locate relevant sections. Most episodes lack creator-provided chapters, making automation essential.

    Explore PODTILE, a novel system developed by Spotify addressing these challenges. Traditional methods struggle with podcasts' unstructured, conversational nature and lengthy transcripts. PODTILE employs a fine-tuned encoder-decoder transformer model that simultaneously segments and generates descriptive chapter titles. A key innovation is utilising global context – including episode metadata and previously generated titles – to maintain coherence and handle long-range dependencies for these long inputs efficiently.

    This system enhances listener experience, particularly for less popular shows, by facilitating easier browsing. Deployed on Spotify's platform, it has shown significant increases in chapter-initiated plays. Furthermore, the auto-generated chapters improve episode discoverability by boosting search effectiveness. While acknowledging the subjective nature of chapterisation, PODTILE offers a powerful tool for podcast navigation.

    Find the paper here

    続きを読む 一部表示
    12 分
  • (Multi-Agent) AgentNet: Decentralised Evolutionary Coordination for LLM Agents
    2025/05/30

    Welcome to our podcast. Today, we explore AgentNet, a groundbreaking framework developed by Shanghai Jiao Tong University to revolutionise LLM-based multi-agent systems. Moving beyond the limitations of centralized control seen in existing systems like MetaGPT or AgentScope, which introduce scalability bottlenecks and single points of failure, AgentNet adopts a fully decentralized architecture, fostering emergent collective intelligence and eliminating single points of failure.

    Its core novelty lies in its dynamic task allocation and adaptive learning capabilities, where agents autonomously evolve their expertise and connections based on experience. Utilising a RAG-based memory, agents refine skills without predefined roles or rigid workflows. This design significantly improves scalability, enhances fault tolerance, and enables privacy-preserving collaboration, which is vital for sharing knowledge across different organisations.

    AgentNet has demonstrated superior efficiency and adaptability compared to traditional methods, excelling in tasks like mathematics, coding, and logical reasoning. While highly promising, current limitations include navigating diverse, heterogeneous agent environments and optimising router decision-making for scaling to very large numbers of agents. Stay tuned to understand the potential of this self-evolving, decentralised AI ecosystem.

    Paper: link

    続きを読む 一部表示
    20 分
  • (LLM Unlearn-AMZN) LUME: LLM Unlearning with Multitask Evaluations
    2025/05/30

    Large language model (LLM) unlearning is becoming vital due to regulations like GDPR's right to be forgotten and the need to remove copyrighted or sensitive content, as retraining models is often impractical. To effectively evaluate unlearning algorithms, researchers developed LUME (LLM Unlearning with Multitask Evaluations).

    LUME stands out as a comprehensive new benchmark. It uniquely addresses limitations of prior evaluations by including three distinct tasks: synthetic creative novels, synthetic biographies with sensitive PII, and real public biographies. This multi-task approach, especially the inclusion of PII, provides extensive coverage for assessing algorithm performance. Effectiveness is measured using metrics like Regurgitation Rate, Knowledge Test Accuracy, Membership Inference Attack (MIA) success, and overall Model Utility on MMLU.

    Experiments on LUME revealed that current unlearning algorithms struggle to sufficiently remove information from the forget set without causing substantial degradation in the model's performance on the retain set and its overall utility. Some methods also show high privacy leakage risks. The benchmark, developed by Amazon AGI, UCLA, UIUC, EPFL, and University of Minnesota, is publicly available and includes fine-tuned 1B and 7B parameter models, with larger models planned.

    Learn more about LUME at:https://assets.amazon.science/47/cc/602c0d16409aa9c668467388b0a9/lume-llm-unlearning-with-multitask-evaluations.pdf

    続きを読む 一部表示
    16 分
  • (FM-Mistral) Mixtral of Expert
    2025/05/30

    Welcome to the FM Series! In this episode, we delve into Mixtral 8x7B, a fascinating new Sparse Mixture of Experts (SMoE) language model presented in a paper on arXiv. This model introduces a novel architecture, built upon the Mistral 7B design, but incorporating eight feedforward blocks, or "experts," in each layer. The key innovation lies in a router network that dynamically selects just two experts for every token at each layer. This means while the model encompasses 47 billion parameters in total, it only activates a much smaller 13 billion parameters during inference, making it efficient. The sources highlight its impressive performance, showing Mixtral outperforms or matches major models like Llama 2 70B and GPT-3.5 across various benchmarks. It particularly excels in mathematics, code generation, and multilingual tasks. An instruction-tuned version, Mixtral 8x7B - Instruct, even surpasses models like GPT-3.5 Turbo and Gemini Pro on human benchmarks. The source material does not explicitly mention any limitations of the model. Both the base and instruct models are available under the Apache 2.0 license. Join us to learn more about this significant development!


    Paper: Link

    続きを読む 一部表示
    9 分