The Easy Goal Inference Problem Is Still Hard AI Safety Fundamentals: Alignment podcast

The Easy Goal Inference Problem Is Still Hard

2+ y ago 7:36

שתפו

סדרה בארכיון ("עדכון לא פעיל" status)

When? This feed was archived on February 21, 2025 21:08 (9M ago). Last successful fetch was on January 02, 2025 12:05 (11M ago)

Why? עדכון לא פעיל status. השרתים שלנו לא הצליחו לאחזר פודקאסט חוקי לזמן ממושך.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

תוכן מסופק על ידי BlueDot Impact. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי BlueDot Impact או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

One approach to the AI control problem goes like this:

Observe what the user of the system says and does.
Infer the user’s preferences.
Try to make the world better according to the user’s preference, perhaps while working alongside the user and asking clarifying questions.

This approach has the major advantage that we can begin empirical work today — we can actually build systems which observe user behavior, try to figure out what the user wants, and then help with that. There are many applications that people care about already, and we can set to work on making rich toy models.

It seems great to develop these capabilities in parallel with other AI progress, and to address whatever difficulties actually arise, as they arise. That is, in each domain where AI can act effectively, we’d like to ensure that AI can also act effectively in the service of goals inferred from users (and that this inference is good enough to support foreseeable applications).

This approach gives us a nice, concrete model of each difficulty we are trying to address. It also provides a relatively clear indicator of whether our ability to control AI lags behind our ability to build it. And by being technically interesting and economically meaningful now, it can help actually integrate AI control with AI practice.

Overall I think that this is a particularly promising angle on the AI safety problem.
Original article:
https://www.alignmentforum.org/posts/h9DesGT3WT9u2k7Hr/the-easy-goal-inference-problem-is-still-hard
Authors:
Paul Christiano

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.

85 פרקים

One approach to the AI control problem goes like this:

Observe what the user of the system says and does.
Infer the user’s preferences.
Try to make the world better according to the user’s preference, perhaps while working alongside the user and asking clarifying questions.

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.

פודקאסטים ששווה להאזין

AI Safety Fundamentals: Alignment « »
The Easy Goal Inference Problem Is Still Hard

סדרה בארכיון ("עדכון לא פעיל" status)