1,758 subscribers
התחל במצב לא מקוון עם האפליקציה Player FM !
Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini - #724
Manage episode 473104756 series 2355587
Today, we're joined by Julie Kallini, PhD student at Stanford University to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression rates for under-resourced languages—and dig into byte-level modeling as an alternative. We discuss the architecture of MrT5, its ability to learn language-specific compression rates, its performance on multilingual benchmarks and character-level manipulation tasks, and its performance and efficiency. For the “Mission: Impossible Language Models” paper, we review the core idea behind the research, the definition and creation of impossible languages, the creation of impossible language training datasets, and explore the bias of language model architectures towards natural language.
The complete show notes for this episode can be found at https://twimlai.com/go/724.
750 פרקים
Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini - #724
The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)
Manage episode 473104756 series 2355587
Today, we're joined by Julie Kallini, PhD student at Stanford University to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression rates for under-resourced languages—and dig into byte-level modeling as an alternative. We discuss the architecture of MrT5, its ability to learn language-specific compression rates, its performance on multilingual benchmarks and character-level manipulation tasks, and its performance and efficiency. For the “Mission: Impossible Language Models” paper, we review the core idea behind the research, the definition and creation of impossible languages, the creation of impossible language training datasets, and explore the bias of language model architectures towards natural language.
The complete show notes for this episode can be found at https://twimlai.com/go/724.
750 פרקים
כל הפרקים
×
1 From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731 1:01:25

1 How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730 1:07:27

1 CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729 56:18

1 Generative Benchmarking with Kelly Hong - #728 54:17

1 Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727 1:34:06

1 Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726 51:45

1 Waymo's Foundation Model for Autonomous Driving with Drago Anguelov - #725 1:09:07

1 Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini - #724 50:32

1 Scaling Up Test-Time Compute with Latent Reasoning with Jonas Geiping - #723 58:38

1 Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722 42:11

1 Inside s1: An o1-Style Reasoning Model That Cost Under $50 to Train with Niklas Muennighoff - #721 49:29

1 Accelerating AI Training and Inference with AWS Trainium2 with Ron Diamant - #720 1:07:05

1 π0: A Foundation Model for Robotics with Sergey Levine - #719 52:30

1 AI Trends 2025: AI Agents and Multi-Agent Systems with Victor Dibia - #718 1:44:59

1 Speculative Decoding and Efficient LLM Inference with Chris Lott - #717 1:16:30
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.