muckrAIkers

著者: Jacob Haimes and Igor Krawczuk
  • サマリー

  • Join us as we dig a tiny bit deeper into the hype surrounding "AI" press releases, research papers, and more. Each episode, we'll highlight ongoing research and investigations, providing some much needed contextualization, constructive critique, and even a smidge of occasional good will teasing to the conversation, trying to find the meaning under all of this muck.
    © Kairos.fm
    続きを読む 一部表示

あらすじ・解説

Join us as we dig a tiny bit deeper into the hype surrounding "AI" press releases, research papers, and more. Each episode, we'll highlight ongoing research and investigations, providing some much needed contextualization, constructive critique, and even a smidge of occasional good will teasing to the conversation, trying to find the meaning under all of this muck.
© Kairos.fm
エピソード
  • NeurIPS 2024 Wrapped 🌯
    2024/12/30
    What happens when you bring over 15,000 machine learning nerds to one city? If your guess didn't include racism, sabotage and scandal, belated epiphanies, a spicy SoLaR panel, and many fantastic research papers, you wouldn't have captured my experience. In this episode we discuss the drama and takeaways from NeurIPS 2024.Posters available at time of episode preparation can be found on the episode webpage.EPISODE RECORDED 2024.12.08(00:00) - Recording date (00:05) - Intro (00:44) - Obligatory mentions (01:54) - SoLaR panel (18:43) - Test of Time (24:17) - And now: science! (28:53) - Downsides of benchmarks (41:39) - Improving the science of ML (53:07) - Performativity (57:33) - NopenAI and Nanthropic (01:09:35) - Fun/interesting papers (01:13:12) - Initial takes on o3 (01:18:12) - WorkArena (01:25:00) - OutroLinksNote: many workshop papers had not yet been published to arXiv as of preparing this episode, the OpenReview submission page is provided in these cases. NeurIPS statement on inclusivityCTOL Digital Solutions article - NeurIPS 2024 Sparks Controversy: MIT Professor's Remarks Ignite "Racism" Backlash Amid Chinese Researchers’ Triumphs(1/2) NeurIPS Best Paper - Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale PredictionVisual Autoregressive Model report this link now provides a 404 errorDon't worry, here it is on archive.isReuters article - ByteDance seeks $1.1 mln damages from intern in AI breach case, report saysCTOL Digital Solutions article - NeurIPS Award Winner Entangled in ByteDance's AI Sabotage Accusations: The Two Tales of an AI GeniusReddit post on Ilya's talkSoLaR workshop pageReferenced SourcesHarvard Data Science Review article - Data Science at the SingularityPaper - Reward Reports for Reinforcement LearningPaper - It's Not What Machines Can Learn, It's What We Cannot TeachPaper - NeurIPS Reproducibility ProgramPaper - A Metric Learning Reality CheckImproving Datasets, Benchmarks, and MeasurementsTutorial video + slides - Experimental Design and Analysis for AI Researchers (I think you need to have attended NeurIPS to access the recording, but I couldn't find a different version)Paper - BetterBench: Assessing AI Benchmarks, Uncovering Issues, and Establishing Best PracticesPaper - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?Paper - A Systematic Review of NeurIPS Dataset Management PracticesPaper - The State of Data Curation at NeurIPS: An Assessment of Dataset Development Practices in the Datasets and Benchmarks TrackPaper - Benchmark Repositories for Better BenchmarkingPaper - Croissant: A Metadata Format for ML-Ready DatasetsPaper - Rethinking the Evaluation of Out-of-Distribution Detection: A Sorites ParadoxPaper - Evaluating Generative AI Systems is a Social Science Measurement ChallengePaper - Report Cards: Qualitative Evaluation of LLMsGovernance RelatedPaper - Towards Data Governance of Frontier AI ModelsPaper - Ways Forward for Global AI Benefit SharingPaper - How do we warn downstream model providers of upstream risks?Unified Model Records toolPaper - Policy Dreamer: Diverse Public Policy Creation via Elicitation and Simulation of Human PreferencesPaper - Monitoring Human Dependence on AI Systems with Reliance DrillsPaper - On the Ethical Considerations of Generative AgentsPaper - GPAI Evaluation Standards Taskforce: Towards Effective AI GovernancePaper - Levels of Autonomy: Liability in the age of AI AgentsCertified Bangers + Useful ToolsPaper - Model Collapse Demystified: The Case of RegressionPaper - Preference Learning Algorithms Do Not Learn Preference RankingsLLM Dataset Inference paper + repodattri paper + repoDeTikZify paper + repoFun Benchmarks/DatasetsPaloma paper + datasetRedPajama paper + datasetAssemblage webpageWikiDBs webpageWhodunitBench repoApeBench paper + repoWorkArena++ paperOther SourcesPaper - The Mirage of Artificial Intelligence Terms of Use Restrictions
    続きを読む 一部表示
    1 時間 27 分
  • OpenAI's o1 System Card, Literally Migraine Inducing
    2024/12/23
    The idea of model cards, which was introduced as a measure to increase transparency and understanding of LLMs, has been perverted into the marketing gimmick characterized by OpenAI's o1 system card. To demonstrate the adversarial stance we believe is necessary to draw meaning from these press-releases-in-disguise, we conduct a close read of the system card. Be warned, there's a lot of muck in this one.Note: All figures/tables discussed in the podcast can be found on the podcast website at https://kairos.fm/muckraikers/e009/(00:00) - Recorded 2024.12.08 (00:54) - Actual intro (03:00) - System cards vs. academic papers (05:36) - Starting off sus (08:28) - o1.continued (12:23) - Rant #1: figure 1 (18:27) - A diamond in the rough (19:41) - Hiding copyright violations (21:29) - Rant #2: Jacob on "hallucinations" (25:55) - More ranting and "hallucination" rate comparison (31:54) - Fairness, bias, and bad science comms (35:41) - System, dev, and user prompt jailbreaking (39:28) - Chain-of-thought and Rao-Blackwellization (44:43) - "Red-teaming" (49:00) - Apollo's bit (51:28) - METR's bit (59:51) - Pass@??? (01:04:45) - SWE Verified (01:05:44) - Appendix bias metrics (01:10:17) - The muck and the meaningLinkso1 system cardOpenAI press release collection - 12 Days of OpenAIAdditional o1 CoverageNIST + AISI [report] - US AISI and UK AISI Joint Pre-Deployment TestApollo Research's paper - Frontier Models are Capable of In-context SchemingVentureBeat article - OpenAI launches full o1 model with image uploads and analysis, debuts ChatGPT ProThe Atlantic article - The GPT Era Is Already EndingOn Data Labelers60 Minutes article + video - Labelers training AI say they're overworked, underpaid and exploited by big American tech companiesReflections article - The hidden health dangers of data labeling in AI developmentPrivacy International article = Humans in the AI loop: the data labelers behind some of the most powerful LLMs' training datasetsChain-of-Thought Papers CitedPaper - Measuring Faithfulness in Chain-of-Thought ReasoningPaper - Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought PromptingPaper - On the Hardness of Faithful Chain-of-Thought Reasoning in Large Language ModelsPaper - Faithfulness vs. Plausibility: On the (Un)Reliability of Explanations from Large Language ModelsOther Mentioned/Relevant SourcesAndy Jones blogpost - Rao-BlackwellizationPaper - Training on the Test Task Confounds Evaluation and EmergencePaper - Best-of-N JailbreakingResearch landing page - SWE BenchCode Competition - Konwinski PrizeLakera game = GandalfKate Crawford's Atlas of AIBlueDot Impact's course - Intro to Transformative AIUnrelated DevelopmentsCruz's letter to Merrick GarlandAWS News Blog article - Introducing Amazon Nova foundation models: Frontier intelligence and industry leading price performanceBleepingComputer article - Ultralytics AI model hijacked to infect thousands with cryptominerThe Register article - Microsoft teases Copilot Vision, the AI sidekick that judges your tabsFox Business article - OpenAI CEO Sam Altman looking forward to working with Trump admin, says US must build best AI infrastructure
    続きを読む 一部表示
    1 時間 17 分
  • How to Safely Handle Your AGI
    2024/12/02
    While on the campaign trail, Trump made claims about repealing Biden's Executive Order on AI, but what will actually be changed when he gets into office? We take this opportunity to examine policies being discussed or implemented by leading governments around the world.(00:00) - Intro (00:29) - Hot off the press (02:59) - Repealing the AI executive order? (11:16) - "Manhattan" for AI (24:33) - EU (30:47) - UK (39:27) - Bengio (44:39) - Comparing EU/UK to USA (45:23) - China (51:12) - Taxes (55:29) - The muckLinksSFChronicle article - US gathers allies to talk AI safety as Trump's vow to undo Biden's AI policy overshadows their workTrump's Executive Order on AI (the AI governance executive order at home)Biden's Executive Order on AICongressional report brief which advises a "Manhattan Project for AI"Non-USACAIRNE resource collection on CERN for AIUK Frontier AI Taskforce report (2023)International interim report (2024)Bengio's paper - AI and Catastrophic RiskDavidad's Safeguarded AI program at ARIAMIT Technology Review article - Four things to know about China’s new AI rules in 2024GovInsider article - Australia’s national policy for ethical use of AI starts to take shapeFuture of Privacy forum article - The African Union’s Continental AI Strategy: Data Protection and Governance Laws Set to Play a Key Role in AI RegulationTaxesMacroeconomic Dynamics paper - Automation, Stagnation, and the Implications of a Robot TaxCESifo paper - AI, Automation, and TaxationGavTax article - Taxation of Artificial Intelligence and AutomationPerplexity PagesCERN for AI pageChina's AI policy pageSingapore's AI policy pageAI policy in Africa, India, Australia pageOther SourcesArtificial Intelligence Made Simple article - NYT's "AI Outperforms Doctors" Story Is WrongIntel report - Reclaim Your Day: The Impact of AI PCs on ProductivityHeise Online article - Users on AI PCs slower, Intel sees problem in unenlightened usersThe Hacker News article - North Korean Hackers Steal $10M with AI-Driven Scams and Malware on LinkedInFuturism article - Character.AI Is Hosting Pedophile Chatbots That Groom Users Who Say They're UnderageVice article - 'AI Jesus' Is Now Taking Confessions at a Church in SwitzerlandPolitico article - Ted Cruz: Congress 'doesn't know what the hell it's doing' with AI regulationUS Senate Committee on Commerce, Science, and Transportation press release - Sen. Cruz Sounds Alarm Over Industry Role in AI Czar Harris’s Censorship Agenda
    続きを読む 一部表示
    58 分

muckrAIkersに寄せられたリスナーの声

カスタマーレビュー:以下のタブを選択することで、他のサイトのレビューをご覧になれます。