1,731 subscribers
התחל במצב לא מקוון עם האפליקציה Player FM !
פודקאסטים ששווה להאזין
בחסות


1 Katisha and Javen talk triangles and photobooths with Sam Prince and Liv Bentley 37:22
From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731
Manage episode 482612593 series 2355587
Today, we're joined by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mahesh highlights the crucial role of data curation, evaluation, and error analysis in model performance, and explains why RL offers a more robust alternative to prompting, and how it can improve multi-step tool use capabilities. We also explore the limitations of supervised fine-tuning (SFT) for tool-augmented reasoning tasks, the reward-shaping strategies they’ve used, and Bespoke Labs’ open-source libraries like Curator. We also touch on the models MiniCheck for hallucination detection and MiniChart for chart-based QA.
The complete show notes for this episode can be found at https://twimlai.com/go/731.
764 פרקים
From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Manage episode 482612593 series 2355587
Today, we're joined by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mahesh highlights the crucial role of data curation, evaluation, and error analysis in model performance, and explains why RL offers a more robust alternative to prompting, and how it can improve multi-step tool use capabilities. We also explore the limitations of supervised fine-tuning (SFT) for tool-augmented reasoning tasks, the reward-shaping strategies they’ve used, and Bespoke Labs’ open-source libraries like Curator. We also touch on the models MiniCheck for hallucination detection and MiniChart for chart-based QA.
The complete show notes for this episode can be found at https://twimlai.com/go/731.
764 פרקים
כל הפרקים
×
1 Multimodal AI Models on Apple Silicon with MLX with Prince Canuma - #744 1:10:20

1 Genie 3: A New Frontier for World Models with Jack Parker-Holder and Shlomi Fruchter - #743 1:01:01

1 Closing the Loop Between AI Training and Inference with Lin Qiao - #742 1:01:11

1 Context Engineering for Productive AI Agents with Filip Kozera - #741 46:01

1 Infrastructure Scaling and Compound AI Systems with Jared Quincy Davis - #740 1:13:02

1 Building Voice AI Agents That Don’t Suck with Kwindla Kramer - #739 1:13:02

1 Distilling Transformers and Diffusion Models for Robust Edge Use Cases with Fatih Porikli - #738 1:00:29

1 Building the Internet of Agents with Vijoy Pandey - #737 56:13

1 LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736 59:31

1 Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision with Jason Corso - #735 56:45

1 Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734 1:25:21

1 Google I/O 2025 Special Edition - #733 26:21

1 RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732 57:09

1 From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731 1:01:25

1 How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730 1:07:27
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.