Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Player FM - Internet Radio Done Right

1,753 subscribers

Artificial Intelligence

הוסף לפני seven שנים

תוכן מסופק על ידי TWIML and Sam Charrington. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי TWIML and Sam Charrington או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

Tinfoil Swans

1
Antoni Porowski and the Right Time to Leave a Party 46:55

לפני 11 ימים46:55

הפעל מאוחר יותר

רשימות

לייק

אהבתי

46:55

"Queer Eye" and "No Taste Like Home" star Antoni Porowski joins Tinfoil Swans live at the Food & Wine Classic in Aspen for a heartfelt and humorous conversation about identity, vulnerability, and finding meaning through food. He opens up about his path from actor and caterer to TV host and storyteller, the emotional layers of cooking, navigating fame, and the gentle art of knowing when to leave the party. Along the way, he shares reflections on therapy, self-care, family dynamics — and what turning 40 taught him. Sponsor: Old Fitzgerald® Kentucky Straight Bourbon Whiskey. Bardstown, KY. 50% Alc./Vol. Think Wisely. Drink Wisely. Learn more about your ad choices. Visit podcastchoices.com/adchoices…

לפני שנה 51:45

MP3•בית הפרקים

Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternative solutions. We explore the Chain-of-Action-Thought (COAT) approach, which uses special tokens—continue, reflect, and explore—to guide the model through distinct reasoning actions, allowing it to navigate complex reasoning tasks without external supervision. We also break down Satori’s two-stage training process: format tuning, which teaches the model to understand and utilize the special action tokens, and reinforcement learning, which optimizes reasoning through trial-and-error self-improvement. We cover key techniques such “restart and explore,” which allows the model to self-correct and generalize beyond its training domain. Finally, Maohao reviews Satori’s performance and how it compares to other models, the reward design, the benchmarks used, and the surprising observations made during the research.

The complete show notes for this episode can be found at https://twimlai.com/go/726.

762 פרקים

#Artificial Intelligence #Tech News #Artificialintelligence #Machinelearning #Samcharrington #Technology #Thisweekinmachinelearning #Sam Charrington #Thetwimlaipocast #Twimlaipodcast #Tech #News #China #TWIML #Datascience #Science

Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1,753 subscribers

published לפני שנה

שתפו

MP3•בית הפרקים

The complete show notes for this episode can be found at https://twimlai.com/go/726.

762 פרקים

כל הפרקים

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Closing the Loop Between AI Training and Inference with Lin Qiao - #742 1:01:11

לפני 4 ימים1:01:11

1:01:11

In this episode, we're joined by Lin Qiao, CEO and co-founder of Fireworks AI. Drawing on key lessons from her time building PyTorch, Lin shares her perspective on the modern generative AI development lifecycle. She explains why aligning training and inference systems is essential for creating a seamless, fast-moving production pipeline, preventing the friction that often stalls deployment. We explore the strategic shift from treating models as commodities to viewing them as core product assets. Lin details how post-training methods, like reinforcement fine-tuning (RFT), allow teams to leverage their own proprietary data to continuously improve these assets. Lin also breaks down the complex challenge of what she calls "3D optimization"—balancing cost, latency, and quality—and emphasizes the role of clear evaluation criteria to guide this process, moving beyond unreliable methods like "vibe checking." Finally, we discuss the path toward the future of AI development: designing a closed-loop system for automated model improvement, a vision made more attainable by the exciting convergence of open and closed-source model capabilities. The complete show notes for this episode can be found at https://twimlai.com/go/742 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Context Engineering for Productive AI Agents with Filip Kozera - #741 46:01

לפני 17 ימים46:01

46:01

In this episode, Filip Kozera, founder and CEO of Wordware, explains his approach to building agentic workflows where natural language serves as the new programming interface. Filip breaks down the architecture of these "background agents," explaining how they use a reflection loop and tool-calling to execute complex tasks. He discusses the current limitations of agent protocols like MCPs and how developers can extend them to handle the required context and authority. The conversation challenges the idea that more powerful models lead to more autonomous agents, arguing instead for "graceful recovery" systems that proactively bring humans into the loop when the agent "knows what it doesn't know." We also get into the "application layer" fight, exploring how SaaS platforms are creating data silos and what this means for the future of interoperable AI agents. Filip also shares his vision for the "word artisan"—the non-technical user who can now build and manage a fleet of AI agents, fundamentally changing the nature of knowledge work. The complete show notes for this episode can be found at https://twimlai.com/go/741 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Infrastructure Scaling and Compound AI Systems with Jared Quincy Davis - #740 1:13:02

לפני 25 ימים1:13:02

1:13:02

In this episode, Jared Quincy Davis, founder and CEO at Foundry, introduces the concept of "compound AI systems," which allows users to create powerful, efficient applications by composing multiple, often diverse, AI models and services. We discuss how these "networks of networks" can push the Pareto frontier, delivering results that are simultaneously faster, more accurate, and even cheaper than single-model approaches. Using examples like "laconic decoding," Jared explains the practical techniques for building these systems and the underlying principles of inference-time scaling. The conversation also delves into the critical role of co-design, where the evolution of AI algorithms and the underlying cloud infrastructure are deeply intertwined, shaping the future of agentic AI and the compute landscape. The complete show notes for this episode can be found at https://twimlai.com/go/740 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Building Voice AI Agents That Don’t Suck with Kwindla Kramer - #739 1:13:02

לפני 4 weeks1:13:02

1:13:02

In this episode, Kwindla Kramer, co-founder and CEO of Daily and creator of the open source Pipecat framework, joins us to discuss the architecture and challenges of building real-time, production-ready conversational voice AI. Kwin breaks down the full stack for voice agents—from the models and APIs to the critical orchestration layer that manages the complexities of multi-turn conversations. We explore why many production systems favor a modular, multi-model approach over the end-to-end models demonstrated by large AI labs, and how this impacts everything from latency and cost to observability and evaluation. Kwin also digs into the core challenges of interruption handling, turn-taking, and creating truly natural conversational dynamics, and how to overcome them. We discuss use cases, thoughts on where the technology is headed, the move toward hybrid edge-cloud pipelines, and the exciting future of real-time video avatars, and much more. The complete show notes for this episode can be found at https://twimlai.com/go/739 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Distilling Transformers and Diffusion Models for Robust Edge Use Cases with Fatih Porikli - #738 1:00:29

לפני 5 weeks1:00:29

1:00:29

Today, we're joined by Fatih Porikli, senior director of technology at Qualcomm AI Research for an in-depth look at several of Qualcomm's accepted papers and demos featured at this year’s CVPR conference. We start with “DiMA: Distilling Multi-modal Large Language Models for Autonomous Driving,” an end-to-end autonomous driving system that incorporates distilling large language models for structured scene understanding and safe planning motion in critical "long-tail" scenarios. We explore how DiMA utilizes LLMs' world knowledge and efficient transformer-based models to significantly reduce collision rates and trajectory errors. We then discuss “SharpDepth: Sharpening Metric Depth Predictions Using Diffusion Distillation,” a diffusion-distilled approach that combines generative models with metric depth estimation to produce sharp, accurate monocular depth maps. Additionally, Fatih also shares a look at Qualcomm’s on-device demos, including text-to-3D mesh generation, real-time image-to-video and video-to-video generation, and a multi-modal visual question-answering assistant. The complete show notes for this episode can be found at https://twimlai.com/go/738 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Building the Internet of Agents with Vijoy Pandey - #737 56:13

לפני 8 weeks56:13

56:13

Today, we're joined by Vijoy Pandey, SVP and general manager at Outshift by Cisco to discuss a foundational challenge for the enterprise: how do we make specialized agents from different vendors collaborate effectively? As companies like Salesforce, Workday, and Microsoft all develop their own agentic systems, integrating them creates a complex, probabilistic, and noisy environment, a stark contrast to the deterministic APIs of the past. Vijoy introduces Cisco's vision for an "Internet of Agents," a platform to manage this new reality, and its open-source implementation, AGNTCY. We explore the four phases of agent collaboration—discovery, composition, deployment, and evaluation—and dive deep into the communication stack, from syntactic protocols like A2A, ACP, and MCP to the deeper semantic challenges of creating a shared understanding between agents. Vijoy also unveils SLIM (Secure Low-Latency Interactive Messaging), a novel transport layer designed to make agent-to-agent communication quantum-safe, real-time, and efficient for multi-modal workloads. The complete show notes for this episode can be found at ⁠ https://twimlai.com/go/737.…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736 59:31

לפני 8 weeks59:31

59:31

Today, we're joined by Ben Wellington, deputy head of feature forecasting at Two Sigma. We dig into the team’s end-to-end approach to leveraging AI in equities feature forecasting, covering how they identify and create features, collect and quantify historical data, and build predictive models to forecast market behavior and asset prices for trading and investment. We explore the firm's platform-centric approach to managing an extensive portfolio of features and models, the impact of multimodal LLMs on accelerating the process of extracting novel features, the importance of strict data timestamping to prevent temporal leakage, and the way they consider build vs. buy decisions in a rapidly evolving landscape. Lastly, Ben also shares insights on leveraging open-source models and the future of agentic AI in quantitative finance. The complete show notes for this episode can be found at https://twimlai.com/go/736 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision with Jason Corso - #735 56:45

לפני 10 weeks56:45

56:45

Today, we're joined by Jason Corso, co-founder of Voxel51 and professor at the University of Michigan, to explore automated labeling in computer vision. Jason introduces FiftyOne, an open-source platform for visualizing datasets, analyzing models, and improving data quality. We focus on Voxel51’s recent research report, “Zero-shot auto-labeling rivals human performance,” which demonstrates how zero-shot auto-labeling with foundation models can yield to significant cost and time savings compared to traditional human annotation. Jason explains how auto-labels, despite being "noisier" at lower confidence thresholds, can lead to better downstream model performance. We also cover Voxel51's "verified auto-labeling" approach, which utilizes a "stoplight" QA workflow (green, yellow, red light) to minimize human review. Finally, we discuss the challenges of handling decision boundary uncertainty and out-of-domain classes, the differences between synthetic data generation in vision and language domains, and the potential of agentic labeling. The complete show notes for this episode can be found at https://twimlai.com/go/735 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734 1:25:21

לפני 10 weeks1:25:21

1:25:21

Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matrix theory and renormalization group ideas to uncover deep insights about model training dynamics. Charles walks us through WeightWatcher’s ability to detect three distinct learning phases—underfitting, grokking, and generalization collapse—and how its signature “layer quality” metric reveals whether individual layers are underfit, overfit, or optimally tuned. Additionally, we dig into the complexities involved in fine-tuning models, the surprising correlation between model optimality and hallucination, the often-underestimated challenges of search relevance, and their implications for RAG. Finally, Charles shares his insights into real-world applications of generative AI and his lessons learned from working in the field. The complete show notes for this episode can be found at https://twimlai.com/go/734 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Google I/O 2025 Special Edition - #733 26:21

לפני 11 weeks26:21

26:21

Today, I’m excited to share a special crossover edition of the podcast recorded live from Google I/O 2025! In this episode, I join Shawn Wang aka Swyx from the Latent Space Podcast, to interview Logan Kilpatrick and Shrestha Basu Mallick, PMs at Google DeepMind working on AI Studio and the Gemini API, along with Kwindla Kramer, CEO of Daily and creator of the Pipecat open source project. We cover all the highlights from the event, including enhancements to the Gemini models like thinking budgets and thought summaries, native audio output for expressive voice AI, and the new URL Context tool for research agents. The discussion also digs into the Gemini Live API, covering its architecture, the challenges of building real-time voice applications (such as latency and voice activity detection), and new features like proactive audio and asynchronous function calling. Finally, don’t miss our guests’ wish lists for next year’s I/O! The complete show notes for this episode can be found at https://twimlai.com/go/733 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732 57:09

לפני 12 weeks57:09

57:09

Today, we're joined by Sebastian Gehrmann, head of responsible AI in the Office of the CTO at Bloomberg, to discuss AI safety in retrieval-augmented generation (RAG) systems and generative AI in high-stakes domains like financial services. We explore how RAG, contrary to some expectations, can inadvertently degrade model safety. We cover examples of unsafe outputs that can emerge from these systems, different approaches to evaluating these safety risks, and the potential reasons behind this counterintuitive behavior. Shifting to the application of generative AI in financial services, Sebastian outlines a domain-specific safety taxonomy designed for the industry's unique needs. We also explore the critical role of governance and regulatory frameworks in addressing these concerns, the role of prompt engineering in bolstering safety, Bloomberg’s multi-layered mitigation strategies, and vital areas for further work in improving AI safety within specialized domains. The complete show notes for this episode can be found at https://twimlai.com/go/732 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731 1:01:25

לפני 13 weeks1:01:25

1:01:25

Today, we're joined by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mahesh highlights the crucial role of data curation, evaluation, and error analysis in model performance, and explains why RL offers a more robust alternative to prompting, and how it can improve multi-step tool use capabilities. We also explore the limitations of supervised fine-tuning (SFT) for tool-augmented reasoning tasks, the reward-shaping strategies they’ve used, and Bespoke Labs’ open-source libraries like Curator. We also touch on the models MiniCheck for hallucination detection and MiniChart for chart-based QA. The complete show notes for this episode can be found at https://twimlai.com/go/731 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730 1:07:27

לפני 14 weeks1:07:27

1:07:27

Today, we're joined by Josh Tobin, member of technical staff at OpenAI, to discuss the company’s approach to building AI agents. We cover OpenAI's three agentic offerings—Deep Research for comprehensive web research, Operator for website navigation, and Codex CLI for local code execution. We explore OpenAI’s shift from simple LLM workflows to reasoning models specifically trained for multi-step tasks through reinforcement learning, and how that enables agents to more easily recover from failures while executing complex processes. Josh shares insights on the practical applications of these agents, including some unexpected use cases. We also discuss the future of human-AI collaboration in software development, such as with "vibe coding," the integration of tools through the Model Control Protocol (MCP), and the significance of context management in AI-enabled IDEs. Additionally, we highlight the challenges of ensuring trust and safety as AI agents become more powerful and autonomous. The complete show notes for this episode can be found at https://twimlai.com/go/730 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729 56:18

לפני 15 weeks56:18

56:18

Today, we're joined by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evaluating LLMs on real-world CTI tasks. Nidhi explains the evolution of AI in cybersecurity, from rule-based systems to LLMs that accelerate analysis by providing critical context for threat detection and defense. We dig into the advantages and challenges of using LLMs in CTI, how techniques like Retrieval-Augmented Generation (RAG) are essential for keeping LLMs up-to-date with emerging threats, and how CTIBench measures LLMs’ ability to perform a set of real-world tasks of the cybersecurity analyst. We unpack the process of building the benchmark, the tasks it covers, and key findings from benchmarking various LLMs. Finally, Nidhi shares the importance of benchmarks in exposing model limitations and blind spots, the challenges of large-scale benchmarking, and the future directions of her AI4Sec Research Lab, including developing reliable mitigation techniques, monitoring "concept drift" in threat detection models, improving explainability in cybersecurity, and more. The complete show notes for this episode can be found at https://twimlai.com/go/729 .…

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

1
Generative Benchmarking with Kelly Hong - #728 54:17

לפני 16 weeks54:17

54:17

In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The conversation explores the two-step process of Generative Benchmarking: filtering documents to focus on relevant content and generating queries that mimic actual user behavior. Kelly shares insights from applying this approach to Weights & Biases' technical support bot, revealing how domain-specific evaluation provides more accurate assessments of embedding model performance. We also discuss the importance of aligning LLM judges with human preferences, the impact of chunking strategies on retrieval effectiveness, and how production queries differ from benchmark queries in ambiguity and style. Throughout the episode, Kelly emphasizes the need for systematic evaluation approaches that go beyond "vibe checks" to help developers build more effective RAG applications. The complete show notes for this episode can be found at https://twimlai.com/go/728 .…

ברוכים הבאים אל Player FM!

Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.

תקשיבו ל-500+ נושאים

1,753 subscribers

דומה לThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

TERRO Ant Killer Bait Stations T300B - Liquid Bait to Eliminate Ants - Bait System - 12 Count Stations for Effective Indoor Ant Control

Crayola Colored Pencils (36ct), Kids Pencil Set, Back to School Essentials, Must Have Classroom Supplies for Kids, Pre-Sharpened Coloring Book Pencils, 3+

Bounty Quick Size Paper Towels, White, 8 Family Rolls = 20 Regular Rolls

פודקאסטים ששווה להאזין

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) « » Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726

Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726

פודקאסטים ששווה להאזין

ברוכים הבאים אל Player FM!

Ailun 3 Pack Screen Protector for iPhone 16 Pro Max [6.9 inch] + 3 Pack Camera Lens Protector with Installation Frame,Sensor Protection,Dynamic Island Compatible,Case Friendly Tempered Glass Film

The Let Them Theory: A Life-Changing Tool That Millions of People Can't Stop Talking About

Amazon Basics Multipurpose Copy Printer Paper, 8.5 x 11 Inches, 20 lb, 1 Ream, (500 Sheets), 92 Bright, White

KPop Demon Hunters (Soundtrack from the Netflix Film)

The Let Them Theory: A Life-Changing Tool That Millions of People Can't Stop Talking About

דומה לThe TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

מדריך עזר מהיר

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence) « »
Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726