“METR: Measuring AI Ability To Complete Long Tasks” By Zach Stein-Perlman LessWrong (Curated & Popular) podcast

Artwork

תוכן מסופק על ידי LessWrong. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי LessWrong או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

LessWrong (Curated & Popular) « »
“METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman

9M ago 11:09

שתפו

MP3•בית הפרקים

תוכן מסופק על ידי LessWrong. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי LessWrong או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under five years, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.
The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts.
Full paper | Github repo
We think that forecasting the capabilities of future AI systems is important for understanding and preparing for the impact of [...]
---
Outline:
(08:58) Conclusion
(09:59) Want to contribute?
---
First published:
March 19th, 2025
Source:
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks
---
Narrated by TYPE III AUDIO.
---

Images from the article:

Graph showing AI task complexity doubling every 7 months through 2026.

Graph showing AI task completion lengths doubling every 7 months.

Graph showing AI model task lengths doubling every 7 months from 2020-2024.

Graph showing

… continue reading

710 פרקים

Artwork

“METR: Measuring AI Ability to Complete Long Tasks” by Zach Stein-Perlman

LessWrong (Curated & Popular)

11 subscribers

published 9M ago

שתפו

MP3•בית הפרקים

תוכן מסופק על ידי LessWrong. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי LessWrong או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

Summary: We propose measuring AI performance in terms of the length of tasks AI agents can complete. We show that this metric has been consistently exponentially increasing over the past 6 years, with a doubling time of around 7 months. Extrapolating this trend predicts that, in under five years, we will see AI agents that can independently complete a large fraction of software tasks that currently take humans days or weeks.
The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years. The shaded region represents 95% CI calculated by hierarchical bootstrap over task families, tasks, and task attempts.
Full paper | Github repo
We think that forecasting the capabilities of future AI systems is important for understanding and preparing for the impact of [...]
---
Outline:
(08:58) Conclusion
(09:59) Want to contribute?
---
First published:
March 19th, 2025
Source:
https://www.lesswrong.com/posts/deesrjitvXM4xYGZd/metr-measuring-ai-ability-to-complete-long-tasks
---
Narrated by TYPE III AUDIO.
---

Images from the article:

Graph showing AI task complexity doubling every 7 months through 2026.

Graph showing AI task completion lengths doubling every 7 months.

Graph showing AI model task lengths doubling every 7 months from 2020-2024.

Graph showing

… continue reading

710 פרקים

All episodes

×

ברוכים הבאים אל Player FM!

Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.

תקשיבו ל-500+ נושאים

מדריך עזר מהיר

פודקאסטים מובילים

מכביבול ברדיו תל אביב

הזווית - פודקאסט

בכל יום נתון - אוריאל דסקל וראם שרמן

עושים חשבון Osim Heshbon

טייכר וזרחוביץ'

חוויית הדור כאהן

גיבור תרבות Culture Hero

האחיות גרים

עושים תוכנה Osim Tochna

השבוע - פודקאסט הארץ

המשחק הגדול

חושבים טוב

המעבדה The Lab

חזית המדע

סליחה על השאלה - ההסכת You Can't Ask That Podcast

שעת מלחמה

עבר פלילי

Lets Talk Murder | בואי נדבר רצח

Lets Talk Murder | בואי נדבר רצח

האזן לתוכנית הזו בזמן שאתה חוקר