1,763 subscribers
התחל במצב לא מקוון עם האפליקציה Player FM !
Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734
Manage episode 486910094 series 2355587
Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matrix theory and renormalization group ideas to uncover deep insights about model training dynamics. Charles walks us through WeightWatcher’s ability to detect three distinct learning phases—underfitting, grokking, and generalization collapse—and how its signature “layer quality” metric reveals whether individual layers are underfit, overfit, or optimally tuned. Additionally, we dig into the complexities involved in fine-tuning models, the surprising correlation between model optimality and hallucination, the often-underestimated challenges of search relevance, and their implications for RAG. Finally, Charles shares his insights into real-world applications of generative AI and his lessons learned from working in the field.
The complete show notes for this episode can be found at https://twimlai.com/go/734.
755 פרקים
Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Manage episode 486910094 series 2355587
Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matrix theory and renormalization group ideas to uncover deep insights about model training dynamics. Charles walks us through WeightWatcher’s ability to detect three distinct learning phases—underfitting, grokking, and generalization collapse—and how its signature “layer quality” metric reveals whether individual layers are underfit, overfit, or optimally tuned. Additionally, we dig into the complexities involved in fine-tuning models, the surprising correlation between model optimality and hallucination, the often-underestimated challenges of search relevance, and their implications for RAG. Finally, Charles shares his insights into real-world applications of generative AI and his lessons learned from working in the field.
The complete show notes for this episode can be found at https://twimlai.com/go/734.
755 פרקים
All episodes
×

1 LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736 59:31


1 Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision with Jason Corso - #735 56:45


1 Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734 1:25:21


1 Google I/O 2025 Special Edition - #733 26:21


1 RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732 57:09


1 From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731 1:01:25


1 How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730 1:07:27


1 CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729 56:18


1 Generative Benchmarking with Kelly Hong - #728 54:17


1 Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727 1:34:06


1 Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726 51:45


1 Waymo's Foundation Model for Autonomous Driving with Drago Anguelov - #725 1:09:07


1 Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini - #724 50:32


1 Scaling Up Test-Time Compute with Latent Reasoning with Jonas Geiping - #723 58:38


1 Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722 42:11
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.