40 subscribers
התחל במצב לא מקוון עם האפליקציה Player FM !
פודקאסטים ששווה להאזין
בחסות


1 Tiffany Yu — Smashing Stereotypes and Building a Disability-Inclusive World 30:23
Unstructured Data and LLMs with Crag Wolfe and Matt Robinson
Manage episode 421905600 series 2455731
The majority of enterprise data exists in heterogenous formats such as HTML, PDF, PNG, and PowerPoint. However, large language models do best when trained with clean, curated data. This presents a major data cleaning challenge.
Unstructured is focused on extracting and transforming complex data to prepare it for vector databases and LLM frameworks.
Crag Wolfe is Head of Engineering and Matt Robinson is Head of Product at Unstructured. They join the podcast to talk about data cleaning in the LLM age.
Sean's been an academic, startup founder, and Googler. He has published works covering a wide range of topics from information visualization to quantum computing. Currently, Sean is Head of Marketing and Developer Relations at Skyflow and host of the podcast Partially Redacted, a podcast about privacy and security engineering. You can connect with Sean on Twitter @seanfalconer .
Please click here to see the transcript of this episode.
Sponsorship inquiries: sponsor@softwareengineeringdaily.com
2108 פרקים
Manage episode 421905600 series 2455731
The majority of enterprise data exists in heterogenous formats such as HTML, PDF, PNG, and PowerPoint. However, large language models do best when trained with clean, curated data. This presents a major data cleaning challenge.
Unstructured is focused on extracting and transforming complex data to prepare it for vector databases and LLM frameworks.
Crag Wolfe is Head of Engineering and Matt Robinson is Head of Product at Unstructured. They join the podcast to talk about data cleaning in the LLM age.
Sean's been an academic, startup founder, and Googler. He has published works covering a wide range of topics from information visualization to quantum computing. Currently, Sean is Head of Marketing and Developer Relations at Skyflow and host of the podcast Partially Redacted, a podcast about privacy and security engineering. You can connect with Sean on Twitter @seanfalconer .
Please click here to see the transcript of this episode.
Sponsorship inquiries: sponsor@softwareengineeringdaily.com
2108 פרקים
כל הפרקים
×
1 MCP Security at Wiz with Rami McCarthy 56:07

1 SED News: Data Land Grabs, Copyright Fights, and the Great AI Talent War 47:14

1 AI at Anaconda with Greg Jennings 49:47

1 ByteDance’s Container Networking Stack with Chen Tang 47:57

1 WayForward Games with Tomm Hulett and Voldi Way 46:02

1 CodeRabbit and RAG for Code Review with Harjot Gill 48:42

1 Emulating Retro Games on Modern Consoles with Robin Lavallée and Bill Litshauer 1:01:34

1 SED News: Corporate Spies, Postgres, and the Weird Life of Devs Right Now 44:38

1 TanStack and the Future of Frontend with Tanner Linsley 55:13

1 The Challenge of AI Model Evaluations with Ankur Goyal 45:22

1 Modern Distributed Applications with Stephan Ewen 41:20

1 Chip Design in the AI Era with Thomas Andersen 50:33

1 OpenTofu with Cory O’Daniel and Malcolm Matalka 48:58

1 Mojo and Building a CUDA Replacement with Chris Lattner 56:14
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.