התחל במצב לא מקוון עם האפליקציה Player FM !
How a Moonshot Led to Google DeepMind's Veo 3
Manage episode 513868902 series 3624003
Dumi Erhan, co-lead of the Veo project at Google DeepMind, joins host Logan Kilpatrick for a deep dive into the evolution of generative video models. They discuss the journey from early research in 2018 to the launch of state-of-the-art Veo 3 model with native audio generation. Learn about the technical hurdles in evaluating and scaling video models, the challenges of long-duration video coherence and how user feedback is shaping the future of AI-powered video creation.
Chapter:
0:00 - Intro
0:47 - Veo project's beginnings
3:02 - Veo's origins in Google Brain
5:07 - Video prediction and robotics applications
7:45 - Early progress and evaluation challenges
10:30 - Physics-based evaluations and their limitations
12:18 - The launch of the original Veo model
14:06 - Scaling challenges for video models
16:02 - The leap from Veo1 to Veo2
19:40 - Veo 3’s viral audio moment
21:17 - User trends shaping Veo's roadmap
23:49 - Image-to-video vs. text-to-video complexity
26:00 - New prompting methods and user control
27:55 - Coherence in long video generation
31:03 - Genie 3 and world models
35:54 - The steerability challenge
41:59 - Capability transfer and image data's role
47:25 - Closing
16 פרקים
Manage episode 513868902 series 3624003
Dumi Erhan, co-lead of the Veo project at Google DeepMind, joins host Logan Kilpatrick for a deep dive into the evolution of generative video models. They discuss the journey from early research in 2018 to the launch of state-of-the-art Veo 3 model with native audio generation. Learn about the technical hurdles in evaluating and scaling video models, the challenges of long-duration video coherence and how user feedback is shaping the future of AI-powered video creation.
Chapter:
0:00 - Intro
0:47 - Veo project's beginnings
3:02 - Veo's origins in Google Brain
5:07 - Video prediction and robotics applications
7:45 - Early progress and evaluation challenges
10:30 - Physics-based evaluations and their limitations
12:18 - The launch of the original Veo model
14:06 - Scaling challenges for video models
16:02 - The leap from Veo1 to Veo2
19:40 - Veo 3’s viral audio moment
21:17 - User trends shaping Veo's roadmap
23:49 - Image-to-video vs. text-to-video complexity
26:00 - New prompting methods and user control
27:55 - Coherence in long video generation
31:03 - Genie 3 and world models
35:54 - The steerability challenge
41:59 - Capability transfer and image data's role
47:25 - Closing
16 פרקים
All episodes
×ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.