התחל במצב לא מקוון עם האפליקציה Player FM !
38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI Future
Manage episode 468961331 series 2844728
In this episode, I chat with David Duvenaud about two topics he's been thinking about: firstly, a paper he wrote about evaluating whether or not frontier models can sabotage human decision-making or monitoring of the same models; and secondly, the difficult situation humans find themselves in in a post-AGI future, even if AI is aligned with human intentions.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Transcript: https://axrp.net/episode/2025/03/01/episode-38_8-david-duvenaud-sabotage-evaluations-post-agi-future.html
FAR.AI: https://far.ai/
FAR.AI on X (aka Twitter): https://x.com/farairesearch
FAR.AI on YouTube: @FARAIResearch
The Alignment Workshop: https://www.alignment-workshop.com/
Topics we discuss, and timestamps:
01:42 - The difficulty of sabotage evaluations
05:23 - Types of sabotage evaluation
08:45 - The state of sabotage evaluations
12:26 - What happens after AGI?
Links:
Sabotage Evaluations for Frontier Models: https://arxiv.org/abs/2410.21514
Gradual Disempowerment: https://gradual-disempowerment.ai/
Episode art by Hamish Doodles: hamishdoodles.com
53 פרקים
Manage episode 468961331 series 2844728
In this episode, I chat with David Duvenaud about two topics he's been thinking about: firstly, a paper he wrote about evaluating whether or not frontier models can sabotage human decision-making or monitoring of the same models; and secondly, the difficult situation humans find themselves in in a post-AGI future, even if AI is aligned with human intentions.
Patreon: https://www.patreon.com/axrpodcast
Ko-fi: https://ko-fi.com/axrpodcast
Transcript: https://axrp.net/episode/2025/03/01/episode-38_8-david-duvenaud-sabotage-evaluations-post-agi-future.html
FAR.AI: https://far.ai/
FAR.AI on X (aka Twitter): https://x.com/farairesearch
FAR.AI on YouTube: @FARAIResearch
The Alignment Workshop: https://www.alignment-workshop.com/
Topics we discuss, and timestamps:
01:42 - The difficulty of sabotage evaluations
05:23 - Types of sabotage evaluation
08:45 - The state of sabotage evaluations
12:26 - What happens after AGI?
Links:
Sabotage Evaluations for Frontier Models: https://arxiv.org/abs/2410.21514
Gradual Disempowerment: https://gradual-disempowerment.ai/
Episode art by Hamish Doodles: hamishdoodles.com
53 פרקים
Semua episod
×ברוכים הבאים אל Player FM!
Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.