-
📆 Feb 27, 2025 - GPT-4.5 Drops TODAY?!, Claude 3.7 Coding BEAST, Grok's Unhinged Voice, Humanlike AI voices & more AI news
- 2025/02/28
- 再生時間: 1 時間 41 分
- ポッドキャスト
-
サマリー
あらすじ・解説
Hey all, Alex here 👋What can I say, the weeks are getting busier , and this is one of those "crazy full" weeks in AI. As we were about to start recording, OpenAI teased GPT 4.5 live stream, and we already had a very busy show lined up (Claude 3.7 vibes are immaculate, Grok got an unhinged voice mode) and I had an interview with Kevin Hou from Windsurf scheduled! Let's dive in! 🔥 GPT 4.5 (ORION) is here - worlds largest LLM (10x GPT4o) OpenAI has finally shipped their next .5 model, which is 10x scale from the previous model. We didn't cover this on the podcast but did watch the OpenAI live stream together after the podcast concluded. A very interesting .5 release from OpenAI, where even Sam Altman says "this model won't crush on benchmarks" and is not the most frontier model, but is OpenAI's LARGEST model by far (folks are speculating 10+ Trillions of parameters) After 2 years of smaller models and distillations, we finally got a new BIG model, that shows scaling laws proper, and while on some benchmarks it won't compete against reasoning models, this model will absolutely fuel a huge increase in capabilities even for reasoners, once o-series models will be trained on top of this. Here's a summary of the announcement and quick vibes recap (from folks who had access to it before) * OpenAI's largest, most knowledgeable model.* Increased world knowledge: 62.5% on SimpleQA, 71.4% GPQA* Better in creative writing, programming, problem-solving (no native step-by-step reasoning).* Text and image input and text output* Available in ChatGPT Pro and API access (API supports Function Calling, Structured Output)* Knowledge Cutoff is October 2023.* Context Window is 128,000 tokens.* Max Output is 16,384 tokens.* Pricing (per 1M tokens): Input: $75, Output: $150, Cached Input: $37.50.* Foundation for future reasoning models4.5 Vibes RecapTons of folks who had access are pointing to the same thing, while this model is not beating others on evals, it's much better at multiple other things, namely creative writing, recommending songs, improved vision capability and improved medical diagnosis. Karpathy said "Everything is a little bit better and it's awesome, but also not exactly in ways that are trivial to point to" and posted a thread of pairwise comparisons of tone on his X threadThough the reaction is bifurcated as many are upset with the high price of this model (10x more costly on outputs) and the fact that it's just marginally better at coding tasks. Compared to the newerSonnet (Sonnet 3.7) and DeepSeek, folks are looking at OpenAI and asking, why isn't this way better? Anthropic's Claude 3.7 Sonnet: A Coding PowerhouseAnthropic released Claude 3.7 Sonnet, and the immediate reaction from the community was overwhelmingly positive. With 8x more output capability (64K) and reasoning built in, this model is an absolute coding powerhouse. Claude 3.7 Sonnet is the new king of coding models, achieving a remarkable 70% on the challenging SWE-Bench benchmark, and the initial user feedback is stellar, though vibes started to shift a bit towards Thursday.Ranking #1 on WebDev arena, and seemingly trained on UX and websites, Claude Sonnet 3.7 (AKA NewerSonner) has been blowing our collective minds since it was released on Monday, especially due to introducing Thinking and reasoning in a combined model. Now, since the start of the week, the community actually had time to play with it, and some of them return to sonnet 3.5 and saying that while the model is generally much more capable, it tends to generate tons of things that are unnecessary. I wonder if the shift is due to Cursor/Windsurf specific prompts, or the model's larger output context, and we'll keep you updated on if the vibes shift again. Open Source LLMsThis week was HUGE for open source, folks. We saw releases pushing the boundaries of speed, multimodality, and even the very way LLMs generate text!DeepSeek's Open Source SpreeDeepSeek went on an absolute tear, open-sourcing a treasure trove of advanced tools and techniques:This isn't your average open-source dump, folks. We're talking FlashMLA (efficient decoding on Hopper GPUs), DeepEP (an optimized communication library for MoE models), DeepGEMM (an FP8 GEMM library that's apparently ridiculously fast), and even parallelism strategies like DualPipe and EPLB.They are releasing some advanced stuff for training and optimization of LLMs, you can follow all their releases on their X accountDual Pipe seems to be the one that got most attention from the community, which is an incredible feat in pipe parallelism, that even got the cofounder of HuggingFace super excitedMicrosoft's Phi-4: Multimodal and Mini (Blog, HuggingFace)Microsoft joined the party with Phi-4-multimodal (5.6B parameters) and Phi-4-mini (3.8B parameters), showing that small models can pack a serious punch.These models are a big deal. Phi-4-multimodal can process text, images, and audio, and it actually beats WhisperV3 on transcription! As Nisten ...