315 subscribers
התחל במצב לא מקוון עם האפליקציה Player FM !
פודקאסטים ששווה להאזין
בחסות


1 How AI is saving billions of years of human research time | Max Jaderberg 19:15
124. Alex Watson - Synthetic data could change everything
Manage episode 328907634 series 2546508
There’s a website called thispersondoesnotexist.com. When you visit it, you’re confronted by a high-resolution, photorealistic AI-generated picture of a human face. As the website’s name suggests, there’s no human being on the face of the earth who looks quite like the person staring back at you on the page.
Each of those generated pictures are a piece of data that captures so much of the essence of what it means to look like a human being. And yet they do so without telling you anything whatsoever about any particular person. In that sense, it’s fully anonymous human face data.
That’s impressive enough, and it speaks to how far generative image models have come over the last decade. But what if we could do the same for any kind of data?
What if I could generate an anonymized set of medical records or financial transaction data that captures all of the latent relationships buried in a private dataset, without the risk of leaking sensitive information about real people? That’s the mission of Alex Watson, the Chief Product Officer and co-founder of Gretel AI, where he works on unlocking value hidden in sensitive datasets in ways that preserve privacy.
What I realized talking to Alex was that synthetic data is about much more than ensuring privacy. As you’ll see over the course of the conversation, we may well be heading for a world where most data can benefit from augmentation via data synthesis — where synthetic data brings privacy value almost as a side-effect of enriching ground truth data with context imported from the wider world.
Alex joined me to talk about data privacy, data synthesis, and what could be the very strange future of the data lifecycle on this episode of the TDS podcast.
***
Intro music:
- Artist: Ron Gelinas
- Track Title: Daybreak Chill Blend (original mix)
- Link to Track: https://youtu.be/d8Y2sKIgFWc
***
Chapters:
- 2:40 What is synthetic data?
- 6:45 Large language models
- 11:30 Preventing data leakage
- 18:00 Generative versus downstream models
- 24:10 De-biasing and fairness
- 30:45 Using synthetic data
- 35:00 People consuming the data
- 41:00 Spotting correlations in the data
- 47:45 Generalization of different ML algorithms
- 51:15 Wrap-up
132 פרקים
Manage episode 328907634 series 2546508
There’s a website called thispersondoesnotexist.com. When you visit it, you’re confronted by a high-resolution, photorealistic AI-generated picture of a human face. As the website’s name suggests, there’s no human being on the face of the earth who looks quite like the person staring back at you on the page.
Each of those generated pictures are a piece of data that captures so much of the essence of what it means to look like a human being. And yet they do so without telling you anything whatsoever about any particular person. In that sense, it’s fully anonymous human face data.
That’s impressive enough, and it speaks to how far generative image models have come over the last decade. But what if we could do the same for any kind of data?
What if I could generate an anonymized set of medical records or financial transaction data that captures all of the latent relationships buried in a private dataset, without the risk of leaking sensitive information about real people? That’s the mission of Alex Watson, the Chief Product Officer and co-founder of Gretel AI, where he works on unlocking value hidden in sensitive datasets in ways that preserve privacy.
What I realized talking to Alex was that synthetic data is about much more than ensuring privacy. As you’ll see over the course of the conversation, we may well be heading for a world where most data can benefit from augmentation via data synthesis — where synthetic data brings privacy value almost as a side-effect of enriching ground truth data with context imported from the wider world.
Alex joined me to talk about data privacy, data synthesis, and what could be the very strange future of the data lifecycle on this episode of the TDS podcast.
***
Intro music:
- Artist: Ron Gelinas
- Track Title: Daybreak Chill Blend (original mix)
- Link to Track: https://youtu.be/d8Y2sKIgFWc
***
Chapters:
- 2:40 What is synthetic data?
- 6:45 Large language models
- 11:30 Preventing data leakage
- 18:00 Generative versus downstream models
- 24:10 De-biasing and fairness
- 30:45 Using synthetic data
- 35:00 People consuming the data
- 41:00 Spotting correlations in the data
- 47:45 Generalization of different ML algorithms
- 51:15 Wrap-up
132 פרקים
כל הפרקים
×

1 130. Edouard Harris - New Research: Advanced AI may tend to seek power *by default* 58:22


1 129. Amber Teng - Building apps with a new generation of language models 51:21


1 128. David Hirko - AI observability and data as a cybersecurity weakness 49:02


1 127. Matthew Stewart - The emerging world of ML sensors 41:34


1 126. JR King - Does the brain run on deep learning? 55:43


1 125. Ryan Fedasiuk - Can the U.S. and China collaborate on AI safety? 48:19


1 124. Alex Watson - Synthetic data could change everything 51:47


1 123. Ala Shaabana and Jacob Steeves - AI on the blockchain (it actually might just make sense) 54:43


1 122. Sadie St. Lawrence - Trends in data science 43:02


1 121. Alexei Baevski - data2vec and the future of multimodal learning 49:31


1 120. Liam Fedus and Barrett Zoph - AI scaling with mixture of expert models 40:47


1 119. Jaime Sevilla - Projecting AI progress from compute trends 48:34


1 118. Angela Fan - Generating Wikipedia articles with AI 51:44


1 117. Beena Ammanath - Defining trustworthy AI 46:46


1 116. Katya Sedova - AI-powered disinformation, present and future 54:24
ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.