ThursdAI - The top AI news from the past week

著者: From Weights & Biases Join AI Evangelist Alex Volkov and a panel of experts to cover everything important that happened in the world of AI from the past week
  • サマリー

  • Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more.

    sub.thursdai.news
    Alex Volkov
    続きを読む 一部表示

あらすじ・解説

Every ThursdAI, Alex Volkov hosts a panel of experts, ai engineers, data scientists and prompt spellcasters on twitter spaces, as we discuss everything major and important that happened in the world of AI for the past week. Topics include LLMs, Open source, New capabilities, OpenAI, competitors in AI space, new LLM models, AI art and diffusion aspects and much more.

sub.thursdai.news
Alex Volkov
エピソード
  • 📆 ThursdAI - Jan 2 - is 25' the year of AI agents?
    2025/01/02
    Hey folks, Alex here 👋 Happy new year!On our first episode of this year, and the second quarter of this century, there wasn't a lot of AI news to report on (most AI labs were on a well deserved break). So this week, I'm very happy to present a special ThursdAI episode, an interview with Joāo Moura, CEO of Crew.ai all about AI agents!We first chatted with Joāo a year ago, back in January of 2024, as CrewAI was blowing up but still just an open source project, it got to be the number 1 trending project on Github, and #1 project on Product Hunt. (You can either listen to the podcast or watch it in the embedded Youtube above)00:36 Introduction and New Year Greetings02:23 Updates on Open Source and LLMs03:25 Deep Dive: AI Agents and Reasoning03:55 Quick TLDR and Recent Developments04:04 Medical LLMs and Modern BERT09:55 Enterprise AI and Crew AI Introduction10:17 Interview with João Moura: Crew AI25:43 Human-in-the-Loop and Agent Evaluation33:17 Evaluating AI Agents and LLMs44:48 Open Source Models and Fin to OpenAI45:21 Performance of Claude's Sonnet 3.548:01 Different parts of an agent topology, brain, memory, tools, caching53:48 Tool Use and Integrations01:04:20 Removing LangChain from Crew01:07:51 The Year of Agents and Reasoning01:18:43 Addressing Concerns About AI01:24:31 Future of AI and Agents01:28:46 Conclusion and Farewell---Is 2025 "the year of AI agents"?AI agents as I remember them as a concept started for me a few month after I started ThursdAI ,when AutoGPT exploded. Was such a novel idea at the time, run LLM requests in a loop,(In fact, back then, I came up with a retry with AI concept and called it TrAI/Catch, where upon an error, I would feed that error back into the GPT api and ask it to correct itself. it feels so long ago!)AutoGPT became the fastest ever Github project to reach 100K stars, and while exciting, it did not work.Since then we saw multiple attempts at agentic frameworks, like babyAGI, autoGen. Crew AI was one of them that keeps being the favorite among many folks.So, what is an AI agent? Simon Willison, friend of the pod, has a mission, to ask everyone who announces a new agent, what they mean when they say it because it seems that everyone "shares" a common understanding of AI agents, but it's different for everyone.We'll start with Joāo's explanation and go from there. But let's assume the basic, it's a set of LLM calls, running in a self correcting loop, with access to planning, external tools (via function calling) and a memory or sorts that make decisions.Though, as we go into detail, you'll see that since the very basic "run LLM in the loop" days, the agents in 2025 have evolved and have a lot of complexity.My takeaways from the conversationI encourage you to listen / watch the whole interview, Joāo is deeply knowledgable about the field and we go into a lot of topics, but here are my main takeaways from our chat* Enterprises are adopting agents, starting with internal use-cases* Crews have 4 different kinds of memory, Long Term (across runs), short term (each run), Entity term (company names, entities), pre-existing knowledge (DNA?)* TIL about a "do all links respond with 200" guardrail* Some of the agent tools we mentioned* Stripe Agent API - for agent payments and access to payment data (blog)* Okta Auth for Gen AI - agent authentication and role management (blog)* E2B - code execution platform for agents (e2b.dev)* BrowserBase - programmatic web-browser for your AI agent* Exa - search grounding for agents for real time understanding* Crew has 13 crews that run 24/7 to automate their company* Crews like Onboarding User Enrichment Crew, Meetings Prep, Taking Phone Calls, Generate Use Cases for Leads* GPT-4o mini is the most used model for 2024 for CrewAI with main factors being speed / cost* Speed of AI development makes it hard to standardize and solidify common integrations.* Reasoning models like o1 still haven't seen a lot of success, partly due to speed, partly due to different way of prompting required.This weeks BuzzWe've just opened up pre-registration for our upcoming FREE evaluations course, featuring Paige Bailey from Google and Graham Neubig from All Hands AI (previously Open Devin). We've distilled a lot of what we learned about evaluating LLM applications while building Weave, our LLM Observability and Evaluation tooling, and are excited to share this with you all! Get on the listAlso, 2 workshops (also about Evals) from us are upcoming, one in SF on Jan 11th and one in Seattle on Jan 13th (which I'm going to lead!) so if you're in those cities at those times, would love to see you!And that's it for this week, there wasn't a LOT of news as I said. The interesting thing is, even in the very short week, the news that we did get were all about agents and reasoning, so it looks like 2025 is agents and reasoning, agents and reasoning!See you all next week 🫡TL;DR with links:* Open Source LLMs* HuatuoGPT-o1 - medical LLM designed for medical ...
    続きを読む 一部表示
    1 時間 31 分
  • 📆 ThursdAI - Dec 26 - OpenAI o3 & o3 mini, DeepSeek v3 658B beating Claude, Qwen Visual Reasoning, Hume OCTAVE & more AI news
    2024/12/27
    Hey everyone, Alex here 👋I was hoping for a quiet holiday week, but whoa, while the last newsletter was only a week ago, what a looong week it has been, just Friday after the last newsletter, it felt like OpenAI has changed the world of AI once again with o3 and left everyone asking "was this AGI?" over the X-mas break (Hope Santa brought you some great gifts!) and then not to be outdone, DeepSeek open sourced basically a Claude 2.5 level behemoth DeepSeek v3 just this morning!Since the breaking news from DeepSeek took us by surprise, the show went a bit longer (3 hours today!) than expected, so as a Bonus, I'm going to release a separate episode with a yearly recap + our predictions from last year and for next year in a few days (soon in your inbox!) TL;DR* Open Source LLMs* CogAgent-9B (Project, Github)* Qwen QvQ 72B - open weights visual reasoning (X, HF, Demo, Project)* GoodFire Ember - MechInterp API - GoldenGate LLama 70B* 🔥 DeepSeek v3 658B MoE - Open Source Claude level model at $6M (X, Paper, HF, Chat)* Big CO LLMs + APIs* 🔥 OpenAI reveals o3 and o3 mini (Blog, X)* X.ai raises ANOTHER 6B dollars - on their way to 200K H200s (X)* This weeks Buzz* Two W&B workshops upcoming in January* SF - January 11* Seattle - January 13 (workshop by yours truly!)* New Evals course with Paige Bailey and Graham Neubig - pre-sign up for free* Vision & Video* Kling 1.6 update (Tweet)* Voice & Audio* Hume OCTAVE - 3B speech-language model (X, Blog)* Tools* OpenRouter added Web Search Grounding to 300+ models (X)Open Source LLMsDeepSeek v3 658B - frontier level open weights model for ~$6M (X, Paper, HF, Chat )This was absolutely the top of the open source / open weights news for the past week, and honestly maybe for the past month. DeepSeek, the previous quant firm from China, has dropped a behemoth model, a 658B parameter MoE (37B active), that you'd need 8xH200 to even run, that beats Llama 405, GPT-4o on most benchmarks and even Claude Sonnet 3.5 on several evals! The vibes seem to be very good with this one, and while it's not all the way beating Claude yet, it's nearly up there already, but the kicker is, they trained it with a very restricted compute, per the paper, with ~2K h800 (which is like H100 but with less bandwidth) for 14.8T tokens. (that's 15x cheaper than LLama 405 for comparison) For evaluations, this model excels on Coding and Math, which is not surprising given how excellent DeepSeek coder has been, but still, very very impressive! On the architecture front, the very interesting thing is, this feels like Mixture of Experts v2, with a LOT of experts (256) and 8+1 active at the same time, multi token prediction, and a lot optimization tricks outlined in the impressive paper (here's a great recap of the technical details)The highlight for me was, that DeepSeek is distilling their recent R1 version into this version, which likely increases the performance of this model on Math and Code in which it absolutely crushes (51.6 on CodeForces and 90.2 on MATH-500) The additional aspect of this is the API costs, and while they are going to raise the prices come February (they literally just swapped v2.5 for v3 in their APIs without telling a soul lol), the price performance for this model is just absurd. Just a massive massive release from the WhaleBros, now I just need a quick 8xH200 to run this and I'm good 😅 Other OpenSource news - Qwen QvQ, CogAgent-9B and GoldenGate LLamaIn other open source news this week, our friends from Qwen have released a very interesting preview, called Qwen QvQ, a visual reasoning model. It uses the same reasoning techniques that we got from them in QwQ 32B, but built with the excellent Qwen VL, to reason about images, and frankly, it's really fun to see it think about an image. You can try it hereand a new update to CogAgent-9B (page), an agent that claims to understand and control your computer, claims to beat Claude 3.5 Sonnet Computer Use with just a 9B model! This is very impressive though I haven't tried it just yet, I'm excited to see those very impressive numbers from open source VLMs driving your computer and doing tasks for you!A super quick word from ... Weights & Biases! We've just opened up pre-registration for our upcoming FREE evaluations course, featuring Paige Bailey from Google and Graham Neubig from All Hands AI. We've distilled a lot of what we learned about evaluating LLM applications while building Weave, our LLM Observability and Evaluation tooling, and are excited to share this with you all! Get on the listAlso, 2 workshops (also about Evals) from us are upcoming, one in SF on Jan 11th and one in Seattle on Jan 13th (which I'm going to lead!) so if you're in those cities at those times, would love to see you!Big Companies - APIs & LLMsOpenAI - introduces o3 and o3-mini - breaking Arc-AGI challenge, GQPA and teasing AGI? On the last day of the 12 days of OpenAI, we've got the evals of their upcoming o3 reasoning model (and o3-mini) and whoah. ...
    続きを読む 一部表示
    1 時間 36 分
  • 🎄ThursdAI - Dec19 - o1 vs gemini reasoning, VEO vs SORA, and holiday season full of AI surprises
    2024/12/20
    For the full show notes and links visit https://sub.thursdai.news🔗 Subscribe to our show on Spotify: https://thursdai.news/spotify🔗 Apple: https://thursdai.news/appleHo, ho, holy moly, folks! Alex here, coming to you live from a world where AI updates are dropping faster than Santa down a chimney! 🎅 It's been another absolutely BANANAS week in the AI world, and if you thought last week was wild, and we're due for a break, buckle up, because this one's a freakin' rollercoaster! 🎢In this episode of ThursdAI, we dive deep into the recent innovations from OpenAI, including their 1-800 ChatGPT phone service and new advancements in voice mode and API functionalities. We discuss the latest updates on O1 model capabilities, including Reasoning Effort settings, and highlight the introduction of WebRTC support by OpenAI. Additionally, we explore the groundbreaking VEO2 model from Google, the generative physics engine Genesis, and new developments in open source models like Cohere's Command R7b. We also provide practical insights on using tools like Weights & Biases for evaluating AI models and share tips on leveraging GitHub Gigi. Tune in for a comprehensive overview of the latest in AI technology and innovation.00:00 Introduction and OpenAI's 12 Days of Releases00:48 Advanced Voice Mode and Public Reactions01:57 Celebrating Tech Innovations02:24 Exciting New Features in AVMs03:08 TLDR - ThursdAI December 1912:58 Voice and Audio Innovations14:29 AI Art, Diffusion, and 3D16:51 Breaking News: Google Gemini 2.023:10 Meta Apollo 7b Revisited33:44 Google's Sora and Veo234:12 Introduction to Veo2 and Sora34:59 First Impressions of Veo235:49 Comparing Veo2 and Sora37:09 Sora's Unique Features38:03 Google's MVP Approach43:07 OpenAI's Latest Releases44:48 Exploring OpenAI's 1-800 CHAT GPT47:18 OpenAI's Fine-Tuning with DPO48:15 OpenAI's Mini Dev Day Announcements49:08 Evaluating OpenAI's O1 Model54:39 Weights & Biases Evaluation Tool - Weave01:03:52 ArcAGI and O1 Performance01:06:47 Introduction and Technical Issues01:06:51 Efforts on Desktop Apps01:07:16 ChatGPT Desktop App Features01:07:25 Working with Apps and Warp Integration01:08:38 Programming with ChatGPT in IDEs01:08:44 Discussion on Warp and Other Tools01:10:37 GitHub GG Project01:14:47 OpenAI Announcements and WebRTC01:24:45 Modern BERT and Smaller Models01:27:37 Genesis: Generative Physics Engine01:33:12 Closing Remarks and Holiday WishesHere’s a talking podcast host speaking excitedly about his showTL;DR - Show notes and Links* Open Source LLMs* Meta Apollo 7B – LMM w/ SOTA video understanding (Page, HF)* Microsoft Phi-4 – 14B SLM (Blog, Paper)* Cohere Command R 7B – (Blog)* Falcon 3 – series of models (X, HF, web)* IBM updates Granite 3.1 + embedding models (HF, Embedding)* Big CO LLMs + APIs* OpenAI releases new o1 + API access (X)* Microsoft makes CoPilot Free! (X)* Google - Gemini Flash 2 Thinking experimental reasoning model (X, Studio)* This weeks Buzz* W&B weave Playground now has Trials (and o1 compatibility) (try it)* Alex Evaluation of o1 and Gemini Thinking experimental (X, Colab, Dashboard)* Vision & Video* Google releases Veo 2 – SOTA text2video modal - beating SORA by most vibes (X)* HunyuanVideo distilled with FastHunyuan down to 6 steps (HF)* Kling 1.6 (X)* Voice & Audio* OpenAI realtime audio improvements (docs)* 11labs new Flash 2.5 model – 75ms generation (X)* Nexa OmniAudio – 2.6B – multimodal local LLM (Blog)* Moonshine Web – real time speech recognition in the browser (X)* Sony MMAudio - open source video 2 audio model (Blog, Demo)* AI Art & Diffusion & 3D* Genesys – open source generative 3D physics engine (X, Site, Github)* Tools* CerebrasCoder – extremely fast apps creation (Try It)* RepoPrompt to chat with o1 Pro – (download) This is a public episode. If you’d like to discuss this with other subscribers or get access to bonus episodes, visit sub.thursdai.news/subscribe
    続きを読む 一部表示
    1 時間 36 分

ThursdAI - The top AI news from the past weekに寄せられたリスナーの声

カスタマーレビュー:以下のタブを選択することで、他のサイトのレビューをご覧になれます。