14 subscribers
התחל במצב לא מקוון עם האפליקציה Player FM !
פודקאסטים ששווה להאזין
בחסות


LLM Data Frontiers
Manage episode 396908499 series 3370867
Curtis Northcutt is the cofounder and CEO of Cleanlab, a data curation platform for LLMs. They have raised $30M in funding from Bain Capital Ventures, Menlo, Databricks, and TQ. He was previously the cofounder and CTO of ChipBrain. He has a PhD in Computer Science from MIT.
(00:07) Data Curation in the Context of LLMs
(01:14) Connection between Language Models and Computer Science
(03:14) Importance of Data Curation for LLMs
(04:06) Challenges in Data Curation for LLMs
(06:09) Confident Learning and its Concept
(09:42) CleanLab and its Role
(12:42) Role of Open Source Datasets and Tooling
(15:08) Balancing Data and Privacy in Regulated Industries
(17:25) Feasibility of Federated Learning
(20:35) Decentralized Compute and Aggregating Compute Clusters
(25:19) Determining Model Size for Data Representation
(27:09) Advice for ML Engineers in Handling Data Curation
(30:20) Rapid Fire Round
Curtis's favorite book: The Bible (in the context of marketing)
--------
Where to find Prateek Joshi:
Newsletter: https://prateekjoshi.substack.com
Website: https://prateekj.com
LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19
Twitter: https://twitter.com/prateekvjoshi
174 פרקים
Manage episode 396908499 series 3370867
Curtis Northcutt is the cofounder and CEO of Cleanlab, a data curation platform for LLMs. They have raised $30M in funding from Bain Capital Ventures, Menlo, Databricks, and TQ. He was previously the cofounder and CTO of ChipBrain. He has a PhD in Computer Science from MIT.
(00:07) Data Curation in the Context of LLMs
(01:14) Connection between Language Models and Computer Science
(03:14) Importance of Data Curation for LLMs
(04:06) Challenges in Data Curation for LLMs
(06:09) Confident Learning and its Concept
(09:42) CleanLab and its Role
(12:42) Role of Open Source Datasets and Tooling
(15:08) Balancing Data and Privacy in Regulated Industries
(17:25) Feasibility of Federated Learning
(20:35) Decentralized Compute and Aggregating Compute Clusters
(25:19) Determining Model Size for Data Representation
(27:09) Advice for ML Engineers in Handling Data Curation
(30:20) Rapid Fire Round
Curtis's favorite book: The Bible (in the context of marketing)
--------
Where to find Prateek Joshi:
Newsletter: https://prateekjoshi.substack.com
Website: https://prateekj.com
LinkedIn: https://www.linkedin.com/in/prateek-joshi-91047b19
Twitter: https://twitter.com/prateekvjoshi
174 פרקים
כל הפרקים
×
1 AI Infra for Long Context Model Training | Anna Patterson, founder of Ceramic AI 39:31

1 Building an AI+Data Startup Studio | Tom Chavez, cofounder of super{set} 52:57

1 Decentralized Data Foundry for AI | Rowan Stone, CEO of Sapien 38:22

1 Converting Cameras into Autonomous AI Agents | Rish Gupta, CEO of Spot AI 38:50

1 Are AI Phone Agents Ready for Prime Time? | Alex Levin, CEO of Regal 45:21

1 What it Takes to Build a BI Platform | Colin Zima, CEO of Omni 40:07

1 Building Billing Infrastructure for AI Companies | Alvaro Morales, CEO of Orb 38:21

1 Turning Legal Services to APIs | Jay Madheswaran, CEO of Eve 41:02

1 Is LLM the New Operating System? | Anant Bhardwaj, CEO of Instabase 45:37

1 Building AI Agents That Actually Work | Malte Kosub, CEO of Parloa 33:54

1 3000 Customers, One Bold Pivot: Building the First Generative AI Copilot for Lawyers | Scott Stevenson, CEO of Spellbook 44:07

1 The Outer Loop of AI-Powered Coding | Merrill Lutsky, CEO of Graphite 41:26

1 Behind the Scenes of AI Video | Amit Jain, founder of Luma AI 48:19

1 Building an AI-Powered Terminal | Zach Lloyd 38:06

1 When Robots Go Haywire, Who Picks Up The Tab? | Amias Gerety 48:54

1 Building MotherDuck to a $400M Company 49:18

1 AI Agents Have Brains, But Where Are Their Wallets? 47:27


1 Building Autonomous Greenhouses with AI and Robotics 37:45

1 Developing Battery Materials with AI 33:27

1 Voice-to-Voice Foundation Models 39:08

1 Digital Replicas That Can Have Real Conversations 37:40




1 Breaking New Ground With Collaborative Robots 49:22


1 How to extract intelligence from speech data with AI 44:56


1 The Long Tail of AI: Understanding and Resolving Edge Cases 37:53
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.