Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini - #724

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Player FM - Internet Radio Done Right

1,765 subscribers

Artificial Intelligence

הוסף לפני seven שנים

תוכן מסופק על ידי TWIML and Sam Charrington. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי TWIML and Sam Charrington או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

Before The Chorus

1
LIVE: Before the Chorus & Open Folk Present: In These Lines feat. Gaby Moreno, Lily Kershaw & James Spaite 33:58

לפני 9 ימים33:58

הפעל מאוחר יותר

רשימות

לייק

אהבתי

33:58

On June 25th 2025, in collaboration with Open Folk, we presented our first ever live interview event in Los Angeles. As Open Folk put it: "In These Lines is a live event where three artists each bring one song — not just to perform, but to explore. They sit down with Sofia Loporcaro, host of Before The Chorus, to talk about where the song came from, what it meant to write it, and what it still holds. Then they play it. Just the song, and the truth behind it." Find Open Folk on Instagram: @openfolkla Find Gaby on Spotify: https://open.spotify.com/artist/0K9pSmFx0kWESA9jqx8aCW?si=Wz4RUP88Qlm_RKs7QTLvWQ On Apple Music: https://music.apple.com/us/artist/gaby-moreno/472697737 Instagram: https://www.instagram.com/gaby_moreno/ Find Lily on Spotify: https://open.spotify.com/artist/0p0ksmwMDQlAM24TWKu4Ua?si=Bmdg-uIUTHu-zRUc_dqL3g On Apple Music: https://music.apple.com/us/artist/lily-kershaw/526884610 Instagram: https://www.instagram.com/lilykershaw/ Find James on Spotify: https://open.spotify.com/artist/3u50TPoLvMBXNT1KrLa3iT?si=OoLoq7ZTRZyUiytQcz0FsQ On Apple Music: https://music.apple.com/us/artist/james-spaite/905076868 Instagram: https://www.instagram.com/jamesspaite/ Subscribe: ⁠⁠⁠⁠⁠⁠⁠https://beforethechorus.bio.to/listen⁠⁠⁠⁠⁠⁠⁠ Sign up for our newsletter: ⁠⁠⁠⁠⁠⁠⁠https://www.beforethechorus.com/⁠⁠⁠⁠⁠⁠⁠ Follow on Instagram: ⁠⁠⁠⁠⁠⁠⁠@beforethechoruspodcast⁠⁠⁠⁠⁠⁠⁠ & ⁠⁠⁠⁠⁠⁠⁠@soundslikesofia⁠⁠⁠⁠⁠⁠⁠ About the podcast: Welcome to Before the Chorus , where we go beyond the sounds of our favourite songs to hear the stories of the artists who wrote them. Before a song is released, a record is produced, or a chorus is written, the musicians that write them think. A lot. They live. A lot. And they feel. A LOT. Hosted by award-winning interviewer Sofia Loporcaro, Before the Chorus explores the genuine human experiences behind the music. Sofia’s deep knowledge of music and personal journey with mental health help her connect with artists on a meaningful level. This is a space where fans connect with artists, and listeners from all walks of life feel seen through the stories that shape the music we love. About the host: Sofia Loporcaro is an award-winning interviewer and radio host who’s spent over 8 years helping musicians share their stories. She’s hosted shows for Amazing Radio, and Transmission Roundhouse. Now on Before the Chorus, she’s had the chance to host guests like Glass Animals, Feist, Madison Cunningham, Mick Jenkins, & Ru Paul's Drag Race winner Shea Couleé. Learn more about your ad choices. Visit megaphone.fm/adchoices…

לפני שנה 50:32

MP3•בית הפרקים

Today, we're joined by Julie Kallini, PhD student at Stanford University to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression rates for under-resourced languages—and dig into byte-level modeling as an alternative. We discuss the architecture of MrT5, its ability to learn language-specific compression rates, its performance on multilingual benchmarks and character-level manipulation tasks, and its performance and efficiency. For the “Mission: Impossible Language Models” paper, we review the core idea behind the research, the definition and creation of impossible languages, the creation of impossible language training datasets, and explore the bias of language model architectures towards natural language.

The complete show notes for this episode can be found at https://twimlai.com/go/724.

760 פרקים

#Artificial Intelligence #Tech News #Artificialintelligence #Machinelearning #Samcharrington #Technology #Thisweekinmachinelearning #Sam Charrington #Thetwimlaipocast #Twimlaipodcast #Tech #News #China #TWIML #Datascience #Science