Intro to Brain-Like-AGI Safety

AI Safety Fundamentals: Alignment

Player FM - Internet Radio Done Right

הוסף לפני two שנים
Looks like the publisher may have taken this series offline or changed its URL. Please contact support if you believe it should be working, the feed URL is invalid, or you have any other concerns about it.

תוכן מסופק על ידי BlueDot Impact. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי BlueDot Impact או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

Carol Costello Presents: The God Hook

1
1 | The False Prophet 35:56

לפני 9 ימים35:56

הפעל מאוחר יותר

רשימות

לייק

אהבתי

35:56

In this premiere episode of "The God Hook," host Carol Costello introduces the chilling story of Richard Beasley, infamously known as the Ohio Craigslist Killer. In previously unreleased jailhouse recordings, Beasley portrays himself as a devout Christian, concealing his manipulative and predatory behavior. As the story unfolds, it becomes clear that Beasley's deceitfulness extends beyond the victims he buried in shallow graves. Listen to the preview of a bonus conversation between Carol and Emily available after the episode. Additional info at carolcostellopresents.com . Do you have questions about this series? Submit them for future Q&A episodes . Subscribe to our YouTube channel to see additional videos, photos, and conversations. For early and ad-free episodes and exclusive bonus content, subscribe to the podcast via Supporting Cast or Apple Podcasts. EPISODE CREDITS Host - Carol Costello Co-Host - Emily Pelphrey Producer - Chris Aiola Sound Design & Mixing - Lochlainn Harte Mixing Supervisor - Sean Rule-Hoffman Production Director - Brigid Coyne Executive Producer - Gerardo Orlando Original Music - Timothy Law Snyder SPECIAL THANKS Kevin Huffman Zoe Louisa Lewis GUESTS Doug Oplinger - Former Managing Editor of the Akron Beacon Journal Volkan Topalli - Professor of Criminal Justice and Criminology Amir Hussain - Professor of Theological Studies Learn more about your ad choices. Visit megaphone.fm/adchoices Support our show by becoming a premium member! https://evergreenpodcasts.supportingcast.fm…

לפני שנה 1:02:10

MP3•בית הפרקים

סדרה בארכיון ("עדכון לא פעיל" status)

When? This feed was archived on February 21, 2025 21:08 (3M ago). Last successful fetch was on January 02, 2025 12:05 (4M ago)

Why? עדכון לא פעיל status. השרתים שלנו לא הצליחו לאחזר פודקאסט חוקי לזמן ממושך.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

(Sections 3.1-3.4, 6.1-6.2, and 7.1-7.5)

Suppose we someday build an Artificial General Intelligence algorithm using similar principles of learning and cognition as the human brain. How would we use such an algorithm safely?

I will argue that this is an open technical problem, and my goal in this post series is to bring readers with no prior knowledge all the way up to the front-line of unsolved problems as I see them.

If this whole thing seems weird or stupid, you should start right in on Post #1, which contains definitions, background, and motivation. Then Posts #2–#7 are mainly neuroscience, and Posts #8–#15 are more directly about AGI safety, ending with a list of open questions and advice for getting involved in the field.

Source:

https://www.lesswrong.com/s/HzcM2dkCq7fwXBej8

Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.

פרקים

1. Intro to Brain-Like-AGI Safety (00:00:00)

2. 3. Two subsystems: Learning & Steering (00:00:14)

3. 3.1 Post summary / Table of contents (00:00:19)

4. 3.2 Big picture (00:04:07)

5. 3.2.1 Each subsystem generally needs its own sensory processor (00:09:35)

6. 3.3 “Triune Brain Theory” is wrong, but let’s not throw out the baby with the bathwater (00:12:27)

7. 3.4 Three types of ingredients in a Steering Subsystem (00:16:35)

8. 3.4.1 Summary table (00:16:44)

9. 3.4.2 Aside: what do I mean by “drives”? (00:18:37)

10. 3.4.3 Category A: Things the Steering Subsystem needs to do in order to get general intelligence (e.g. curiosity drive) (00:20:46)

11. 3.4.4 Category B: Everything else in the human Steering Subsystem (e.g. altruism-related drives) (00:24:15)

12. 3.4.5 Category C: Every other possibility (e.g. drive to increase my bank account balance) (00:28:26)

13. 6. Big picture of motivation, decision-making, and RL (00:31:19)

14. 6.1 Post summary / Table of contents (00:31:30)

15. 6.2 Big picture (00:35:54)

16. 6.2.1 Relation to “two subsystems” (00:37:43)

17. 6.2.2 Quick run-through (00:38:41)

18. 7. From hardcoded drives to foresighted plans: A worked example (00:42:30)

19. 7.1 Post summary / Table of contents (00:42:43)

20. 7.2 Reminder from the previous post: big picture of motivation and decision-making (00:45:24)

21. 7.3 Building a probabilistic generative world-model in the cortex (00:46:21)

22. 7.4 Credit assignment when I first bite into the cake (00:48:40)

23. 7.5 Planning towards goals via reward-shaping (00:53:53)

24. 7.5.1 The other Thought Assessors. Or: The heroic feat of ordering a cake for next week, when you’re feeling nauseous right now (00:59:09)

85 פרקים

#Tech #Society #Philosophy #Blue Dot Impact

Intro to Brain-Like-AGI Safety

AI Safety Fundamentals: Alignment

published לפני שנה

שתפו

MP3•בית הפרקים

סדרה בארכיון ("עדכון לא פעיל" status)

When? This feed was archived on February 21, 2025 21:08 (3M ago). Last successful fetch was on January 02, 2025 12:05 (4M ago)

Why? עדכון לא פעיל status. השרתים שלנו לא הצליחו לאחזר פודקאסט חוקי לזמן ממושך.

(Sections 3.1-3.4, 6.1-6.2, and 7.1-7.5)

Suppose we someday build an Artificial General Intelligence algorithm using similar principles of learning and cognition as the human brain. How would we use such an algorithm safely?

I will argue that this is an open technical problem, and my goal in this post series is to bring readers with no prior knowledge all the way up to the front-line of unsolved problems as I see them.

Source:

https://www.lesswrong.com/s/HzcM2dkCq7fwXBej8

Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.

פרקים

1. Intro to Brain-Like-AGI Safety (00:00:00)

2. 3. Two subsystems: Learning & Steering (00:00:14)

3. 3.1 Post summary / Table of contents (00:00:19)

4. 3.2 Big picture (00:04:07)

5. 3.2.1 Each subsystem generally needs its own sensory processor (00:09:35)

6. 3.3 “Triune Brain Theory” is wrong, but let’s not throw out the baby with the bathwater (00:12:27)

7. 3.4 Three types of ingredients in a Steering Subsystem (00:16:35)

8. 3.4.1 Summary table (00:16:44)

9. 3.4.2 Aside: what do I mean by “drives”? (00:18:37)

10. 3.4.3 Category A: Things the Steering Subsystem needs to do in order to get general intelligence (e.g. curiosity drive) (00:20:46)

11. 3.4.4 Category B: Everything else in the human Steering Subsystem (e.g. altruism-related drives) (00:24:15)

12. 3.4.5 Category C: Every other possibility (e.g. drive to increase my bank account balance) (00:28:26)

13. 6. Big picture of motivation, decision-making, and RL (00:31:19)

14. 6.1 Post summary / Table of contents (00:31:30)

15. 6.2 Big picture (00:35:54)

16. 6.2.1 Relation to “two subsystems” (00:37:43)

17. 6.2.2 Quick run-through (00:38:41)

18. 7. From hardcoded drives to foresighted plans: A worked example (00:42:30)

19. 7.1 Post summary / Table of contents (00:42:43)

20. 7.2 Reminder from the previous post: big picture of motivation and decision-making (00:45:24)

21. 7.3 Building a probabilistic generative world-model in the cortex (00:46:21)

22. 7.4 Credit assignment when I first bite into the cake (00:48:40)

23. 7.5 Planning towards goals via reward-shaping (00:53:53)

24. 7.5.1 The other Thought Assessors. Or: The heroic feat of ordering a cake for next week, when you’re feeling nauseous right now (00:59:09)

85 פרקים

#Tech #Society #Philosophy #Blue Dot Impact

כל הפרקים

1
Introduction to Mechanistic Interpretability 11:45

לפני 18 weeks11:45

11:45

Our introduction introduces common mech interp concepts, to prepare you for the rest of this session's resources. Original text: https://aisafetyfundamentals.com/blog/introduction-to-mechanistic-interpretability/ Author(s): Sarah Hastings-Woodhouse A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

1
We Need a Science of Evals 20:12

לפני 18 weeks20:12

20:12

This lays out a number of open questions, in what the author calls a 'Science of Evals'. Original text: https://www.apolloresearch.ai/blog/we-need-a-science-of-evals Author(s): Apollo Research blog A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.

1
Illustrating Reinforcement Learning from Human Feedback (RLHF) 22:32

לפני 42 weeks22:32

22:32

This more technical article explains the motivations for a system like RLHF, and adds additional concrete details as to how the RLHF approach is applied to neural networks. While reading, consider which parts of the technical implementation correspond to the 'values coach' and 'coherence coach' from the previous video. A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

1
Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback 32:19

לפני 42 weeks32:19

32:19

This paper explains Anthropic’s constitutional AI approach, which is largely an extension on RLHF but with AIs replacing human demonstrators and human evaluators. Everything in this paper is relevant to this week's learning objectives, and we recommend you read it in its entirety. It summarises limitations with conventional RLHF, explains the constitutional AI approach, shows how it performs, and where future research might be directed. If you are in a rush, focus on sections 1.2, 3.1, 3.4, 4.1, 6.1, 6.2. A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

1
Constitutional AI Harmlessness from AI Feedback 1:01:49

לפני 42 weeks1:01:49

1:01:49

1
Intro to Brain-Like-AGI Safety 1:02:10

לפני 46 weeks1:02:10

1:02:10

(Sections 3.1-3.4 , 6.1-6.2 , and 7.1-7.5 ) Suppose we someday build an Artificial General Intelligence algorithm using similar principles of learning and cognition as the human brain. How would we use such an algorithm safely? I will argue that this is an open technical problem, and my goal in this post series is to bring readers with no prior knowledge all the way up to the front-line of unsolved problems as I see them. If this whole thing seems weird or stupid, you should start right in on Post #1 , which contains definitions, background, and motivation. Then Posts #2 – #7 are mainly neuroscience, and Posts #8 – #15 are more directly about AGI safety, ending with a list of open questions and advice for getting involved in the field. Source: https://www.lesswrong.com/s/HzcM2dkCq7fwXBej8 Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

1
Chinchilla’s Wild Implications 24:57

לפני 46 weeks24:57

24:57

This post is about language model scaling laws, specifically the laws derived in the DeepMind paper that introduced Chinchilla. The paper came out a few months ago, and has been discussed a lot, but some of its implications deserve more explicit notice in my opinion. In particular: Data, not size, is the currently active constraint on language modeling performance. Current returns to additional data are immense, and current returns to additional model size are miniscule; indeed, most recent landmark models are wastefully big. If we can leverage enough data, there is no reason to train ~500B param models, much less 1T or larger models. If we have to train models at these large sizes, it will mean we have encountered a barrier to exploitation of data scaling, which would be a great loss relative to what would otherwise be possible. The literature is extremely unclear on how much text data is actually available for training. We may be "running out" of general-domain data, but the literature is too vague to know one way or the other. The entire available quantity of data in highly specialized domains like code is woefully tiny, compared to the gains that would be possible if much more such data were available. Some things to note at the outset: This post assumes you have some familiarity with LM scaling laws. As in the paper, I'll assume here that models never see repeated data in training. Original text: https://www.alignmentforum.org/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

1
Deep Double Descent 8:27

לפני 46 weeks8:27

8:27

We show that the double descent phenomenon occurs in CNNs, ResNets, and transformers: performance first improves, then gets worse, and then improves again with increasing model size, data size, or training time. This effect is often avoided through careful regularization. While this behavior appears to be fairly universal, we don’t yet fully understand why it happens, and view further study of this phenomenon as an important research direction. Source: https://openai.com/research/deep-double-descent Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

1
Eliciting Latent Knowledge 1:00:27

לפני 46 weeks1:00:27

1:00:27

In this post, we’ll present ARC’s approach to an open problem we think is central to aligning powerful machine learning (ML) systems: Suppose we train a model to predict what the future will look like according to cameras and other sensors. We then use planning algorithms to find a sequence of actions that lead to predicted futures that look good to us. But some action sequences could tamper with the cameras so they show happy humans regardless of what’s really happening. More generally, some futures look great on camera but are actually catastrophically bad. In these cases, the prediction model “knows” facts (like “the camera was tampered with”) that are not visible on camera but would change our evaluation of the predicted future if we learned them. How can we train this model to report its latent knowledge of off-screen events? We’ll call this problem eliciting latent knowledge (ELK). In this report we’ll focus on detecting sensor tampering as a motivating example, but we believe ELK is central to many aspects of alignment. Source: https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit# Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

1
Empirical Findings Generalize Surprisingly Far 11:32

לפני 46 weeks11:32

11:32

Previously, I argued that emergent phenomena in machine learning mean that we can’t rely on current trends to predict what the future of ML will be like. In this post, I will argue that despite this, empirical findings often do generalize very far, including across “phase transitions” caused by emergent behavior. This might seem like a contradiction, but actually I think divergence from current trends and empirical generalization are consistent. Findings do often generalize, but you need to think to determine the right generalization, and also about what might stop any given generalization from holding. I don’t think many people would contest the claim that empirical investigation can uncover deep and generalizable truths. This is one of the big lessons of physics, and while some might attribute physics’ success to math instead of empiricism, I think it’s clear that you need empirical data to point to the right mathematics. However, just invoking physics isn’t a good argument, because physical laws have fundamental symmetries that we shouldn’t expect in machine learning. Moreover, we care specifically about findings that continue to hold up after some sort of emergent behavior (such as few-shot learning in the case of ML). So, to make my case, I’ll start by considering examples in deep learning that have held up in this way. Since “modern” deep learning hasn’t been around that long, I’ll also look at examples from biology, a field that has been around for a relatively long time and where More Is Different is ubiquitous (see Appendix: More Is Different In Other Domains). Source: https://bounded-regret.ghost.io/empirical-findings-generalize-surprisingly-far/ Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

1
Low-Stakes Alignment 13:56

לפני 46 weeks13:56

13:56

Right now I’m working on finding a good objective to optimize with ML, rather than trying to make sure our models are robustly optimizing that objective. (This is roughly “outer alignment.”) That’s pretty vague, and it’s not obvious whether “find a good objective” is a meaningful goal rather than being inherently confused or sweeping key distinctions under the rug. So I like to focus on a more precise special case of alignment: solve alignment when decisions are “low stakes.” I think this case effectively isolates the problem of “find a good objective” from the problem of ensuring robustness and is precise enough to focus on productively. In this post I’ll describe what I mean by the low-stakes setting, why I think it isolates this subproblem, why I want to isolate this subproblem, and why I think that it’s valuable to work on crisp subproblems. Source: https://www.alignmentforum.org/posts/TPan9sQFuPP6jgEJo/low-stakes-alignment Narrated for AI Safety Fundamentals by TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

1
Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions 16:39

לפני 46 weeks16:39

16:39

Using hard multiple-choice reading comprehension questions as a testbed, we assess whether presenting humans with arguments for two competing answer options, where one is correct and the other is incorrect, allows human judges to perform more accurately, even when one of the arguments is unreliable and deceptive. If this is helpful, we may be able to increase our justified trust in language-model-based systems by asking them to produce these arguments where needed. Previous research has shown that just a single turn of arguments in this format is not helpful to humans. However, as debate settings are characterized by a back-and-forth dialogue, we follow up on previous results to test whether adding a second round of counter-arguments is helpful to humans. We find that, regardless of whether they have access to arguments or not, humans perform similarly on our task. These findings suggest that, in the case of answering reading comprehension questions, debate is not a helpful format. Source: https://arxiv.org/abs/2210.10860 Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

1
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models 16:08

לפני 46 weeks16:08

16:08

Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. Our experimental results on tasks related to symbolic manipulation, compositional generalization, and math reasoning reveal that least-to-most prompting is capable of generalizing to more difficult problems than those seen in the prompts. A notable finding is that when the GPT-3 code-davinci-002 model is used with least-to-most prompting, it can solve the compositional generalization benchmark SCAN in any split (including length split) with an accuracy of at least 99% using just 14 exemplars, compared to only 16% accuracy with chain-of-thought prompting. This is particularly noteworthy because neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples. We have included prompts for all the tasks in the Appendix. Source: https://arxiv.org/abs/2205.10625 Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

1
ABS: Scanning Neural Networks for Back-Doors by Artificial Brain Stimulation 16:08

לפני 46 weeks16:08

16:08

This paper presents a technique to scan neural network based AI models to determine if they are trojaned. Pre-trained AI models may contain back-doors that are injected through training or by transforming inner neuron weights. These trojaned models operate normally when regular inputs are provided, and mis-classify to a specific output label when the input is stamped with some special pattern called trojan trigger. We develop a novel technique that analyzes inner neuron behaviors by determining how output acti- vations change when we introduce different levels of stimulation to a neuron. The neurons that substantially elevate the activation of a particular output label regardless of the provided input is considered potentially compromised. Trojan trigger is then reverse-engineered through an optimization procedure using the stimulation analysis results, to confirm that a neuron is truly compromised. We evaluate our system ABS on 177 trojaned models that are trojaned with vari-ous attack methods that target both the input space and the feature space, and have various trojan trigger sizes and shapes, together with 144 benign models that are trained with different data and initial weight values. These models belong to 7 different model structures and 6 different datasets, including some complex ones such as ImageNet, VGG-Face and ResNet110. Our results show that ABS is highly effective, can achieve over 90% detection rate for most cases (and many 100%), when only one input sample is provided for each output label. It substantially out-performs the state-of-the-art technique Neural Cleanse that requires a lot of input samples and small trojan triggers to achieve good performance. Source: https://www.cs.purdue.edu/homes/taog/docs/CCS19.pdf Narrated for AI Safety Fundamentals the Effective Altruism Forum Joseph Carlsmith LessWrong 80,000 Hours by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

1
Imitative Generalisation (AKA ‘Learning the Prior’) 18:14

לפני 46 weeks18:14

18:14

This post tries to explain a simplified version of Paul Christiano’s mechanism introduced here , (referred to there as ‘Learning the Prior’) and explain why a mechanism like this potentially addresses some of the safety problems with naïve approaches. First we’ll go through a simple example in a familiar domain, then explain the problems with the example. Then I’ll discuss the open questions for making Imitative Generalization actually work, and the connection with the Microscope AI idea. A more detailed explanation of exactly what the training objective is (with diagrams), and the correspondence with Bayesian inference, are in the appendix. Source: https://www.alignmentforum.org/posts/JKj5Krff5oKMb8TjT/imitative-generalisation-aka-learning-the-prior-1 Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

ברוכים הבאים אל Player FM!

Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.

תקשיבו ל-500+ נושאים

The Let Them Theory: A Life-Changing Tool That Millions of People Can't Stop Talking About

Bounty Paper Towels Quick Size, White, 16 Family Rolls = 40 Regular Rolls (Packaging May Vary)

Tubi: Watch Free Movies & TV Shows

פודקאסטים ששווה להאזין

AI Safety Fundamentals: Alignment « » Intro to Brain-Like-AGI Safety

סדרה בארכיון ("עדכון לא פעיל" status)

פרקים

1. Intro to Brain-Like-AGI Safety (00:00:00)

2. 3. Two subsystems: Learning & Steering (00:00:14)

3. 3.1 Post summary / Table of contents (00:00:19)

4. 3.2 Big picture (00:04:07)

5. 3.2.1 Each subsystem generally needs its own sensory processor (00:09:35)

6. 3.3 “Triune Brain Theory” is wrong, but let’s not throw out the baby with the bathwater (00:12:27)

7. 3.4 Three types of ingredients in a Steering Subsystem (00:16:35)

8. 3.4.1 Summary table (00:16:44)

9. 3.4.2 Aside: what do I mean by “drives”? (00:18:37)

10. 3.4.3 Category A: Things the Steering Subsystem needs to do in order to get general intelligence (e.g. curiosity drive) (00:20:46)

11. 3.4.4 Category B: Everything else in the human Steering Subsystem (e.g. altruism-related drives) (00:24:15)

12. 3.4.5 Category C: Every other possibility (e.g. drive to increase my bank account balance) (00:28:26)

13. 6. Big picture of motivation, decision-making, and RL (00:31:19)

14. 6.1 Post summary / Table of contents (00:31:30)

15. 6.2 Big picture (00:35:54)

16. 6.2.1 Relation to “two subsystems” (00:37:43)

17. 6.2.2 Quick run-through (00:38:41)

18. 7. From hardcoded drives to foresighted plans: A worked example (00:42:30)

19. 7.1 Post summary / Table of contents (00:42:43)

20. 7.2 Reminder from the previous post: big picture of motivation and decision-making (00:45:24)

21. 7.3 Building a probabilistic generative world-model in the cortex (00:46:21)

22. 7.4 Credit assignment when I first bite into the cake (00:48:40)

23. 7.5 Planning towards goals via reward-shaping (00:53:53)

24. 7.5.1 The other Thought Assessors. Or: The heroic feat of ordering a cake for next week, when you’re feeling nauseous right now (00:59:09)

Intro to Brain-Like-AGI Safety

סדרה בארכיון ("עדכון לא פעיל" status)

פרקים

1. Intro to Brain-Like-AGI Safety (00:00:00)

2. 3. Two subsystems: Learning & Steering (00:00:14)

3. 3.1 Post summary / Table of contents (00:00:19)

4. 3.2 Big picture (00:04:07)

5. 3.2.1 Each subsystem generally needs its own sensory processor (00:09:35)

6. 3.3 “Triune Brain Theory” is wrong, but let’s not throw out the baby with the bathwater (00:12:27)

7. 3.4 Three types of ingredients in a Steering Subsystem (00:16:35)

8. 3.4.1 Summary table (00:16:44)

9. 3.4.2 Aside: what do I mean by “drives”? (00:18:37)

10. 3.4.3 Category A: Things the Steering Subsystem needs to do in order to get general intelligence (e.g. curiosity drive) (00:20:46)

11. 3.4.4 Category B: Everything else in the human Steering Subsystem (e.g. altruism-related drives) (00:24:15)

12. 3.4.5 Category C: Every other possibility (e.g. drive to increase my bank account balance) (00:28:26)

13. 6. Big picture of motivation, decision-making, and RL (00:31:19)

14. 6.1 Post summary / Table of contents (00:31:30)

15. 6.2 Big picture (00:35:54)

16. 6.2.1 Relation to “two subsystems” (00:37:43)

17. 6.2.2 Quick run-through (00:38:41)

18. 7. From hardcoded drives to foresighted plans: A worked example (00:42:30)

19. 7.1 Post summary / Table of contents (00:42:43)

20. 7.2 Reminder from the previous post: big picture of motivation and decision-making (00:45:24)

21. 7.3 Building a probabilistic generative world-model in the cortex (00:46:21)

22. 7.4 Credit assignment when I first bite into the cake (00:48:40)

23. 7.5 Planning towards goals via reward-shaping (00:53:53)

24. 7.5.1 The other Thought Assessors. Or: The heroic feat of ordering a cake for next week, when you’re feeling nauseous right now (00:59:09)

פודקאסטים ששווה להאזין

ברוכים הבאים אל Player FM!

Bounty Quick Size Paper Towels, White, 8 Family Rolls = 20 Regular Rolls (Packaging May Vary)

The Let Them Theory: A Life-Changing Tool That Millions of People Can't Stop Talking About

Amazon Basics Multipurpose Copy Printer Paper, 8.5" x 11", 20 lb, 8 Reams, 4000 Sheets, 92 Bright, White

Ailun 3 Pack Screen Protector for iPhone 16 Pro Max [6.9 inch] + 3 Pack Camera Lens Protector with Installation Frame,Sensor Protection,Dynamic Island Compatible,Case Friendly Tempered Glass Film

The Let Them Theory: A Life-Changing Tool That Millions of People Can't Stop Talking About

מדריך עזר מהיר

AI Safety Fundamentals: Alignment « »
Intro to Brain-Like-AGI Safety