Least-To-Most Prompting Enables Complex Reasoning in Large Language Models

AI Safety Fundamentals: Alignment

Player FM - Internet Radio Done Right

הוסף לפני two שנים
Looks like the publisher may have taken this series offline or changed its URL. Please contact support if you believe it should be working, the feed URL is invalid, or you have any other concerns about it.

תוכן מסופק על ידי BlueDot Impact. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי BlueDot Impact או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

Skip Intro

1
Zoey Deutch (Nouvelle Vague) 37:44

לפני 11 ימים37:44

הפעל מאוחר יותר

רשימות

לייק

אהבתי

37:44

Zoey Deutch returns to Skip Intro to talk about her latest transformation in Nouvelle Vague as American actress Jean Seberg. Directed by acclaimed filmmaker Richard Linklater — and a love letter to the French New Wave classic Breathless — Nouvelle Vague wasn’t the first time that Linklater and Deutch shared a film set. Deutch shares how Linklater compares rehearsals to athletics, starting exciting new chapters in her personal life, hilarious irrational fears, and the deep love she has for her sister, Maddie. Video episodes are also available on the Still Watching Netflix YouTube Channel. Listen to more from Netflix Podcasts .…

לפני שנה 16:08

MP3•בית הפרקים

סדרה בארכיון ("עדכון לא פעיל" status)

When? This feed was archived on February 21, 2025 21:08 (9M ago). Last successful fetch was on January 02, 2025 12:05 (11M ago)

Why? עדכון לא פעיל status. השרתים שלנו לא הצליחו לאחזר פודקאסט חוקי לזמן ממושך.

What now? You might be able to find a more up-to-date version using the search function. This series will no longer be checked for updates. If you believe this to be in error, please check if the publisher's feed link below is valid and contact support to request the feed be restored or if you have any other concerns about this.

Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. Our experimental results on tasks related to symbolic manipulation, compositional generalization, and math reasoning reveal that least-to-most prompting is capable of generalizing to more difficult problems than those seen in the prompts. A notable finding is that when the GPT-3 code-davinci-002 model is used with least-to-most prompting, it can solve the compositional generalization benchmark SCAN in any split (including length split) with an accuracy of at least 99% using just 14 exemplars, compared to only 16% accuracy with chain-of-thought prompting. This is particularly noteworthy because neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples. We have included prompts for all the tasks in the Appendix.

Source:

https://arxiv.org/abs/2205.10625

Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.

פרקים

1. Least-To-Most Prompting Enables Complex Reasoning in Large Language Models (00:00:00)

2. ABSTRACT (00:00:17)

3. 1 INTRODUCTION (00:01:37)

4. 2 LEAST-TO-MOST PROMPTING (00:05:38)

5. 3 RESULTS (00:07:41)

85 פרקים

#Tech #Society #Philosophy #Blue Dot Impact

AI Safety Fundamentals: Alignment

Least-To-Most Prompting Enables Complex Reasoning in Large Language Models

AI Safety Fundamentals: Alignment

published לפני שנה

שתפו

MP3•בית הפרקים

סדרה בארכיון ("עדכון לא פעיל" status)

When? This feed was archived on February 21, 2025 21:08 (9M ago). Last successful fetch was on January 02, 2025 12:05 (11M ago)

Why? עדכון לא פעיל status. השרתים שלנו לא הצליחו לאחזר פודקאסט חוקי לזמן ממושך.

Source:

https://arxiv.org/abs/2205.10625

Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO.

---

A podcast by BlueDot Impact.
Learn more on the AI Safety Fundamentals website.

פרקים

1. Least-To-Most Prompting Enables Complex Reasoning in Large Language Models (00:00:00)

2. ABSTRACT (00:00:17)

3. 1 INTRODUCTION (00:01:37)

4. 2 LEAST-TO-MOST PROMPTING (00:05:38)

5. 3 RESULTS (00:07:41)

85 פרקים

#Tech #Society #Philosophy #Blue Dot Impact

כל הפרקים

AI Safety Fundamentals: Alignment

1
Introduction to Mechanistic Interpretability 11:45

לפני 47 weeks11:45

11:45

Our introduction introduces common mech interp concepts, to prepare you for the rest of this session's resources. Original text: https://aisafetyfundamentals.com/blog/introduction-to-mechanistic-interpretability/ Author(s): Sarah Hastings-Woodhouse A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

AI Safety Fundamentals: Alignment

1
We Need a Science of Evals 20:12

לפני 47 weeks20:12

20:12

This lays out a number of open questions, in what the author calls a 'Science of Evals'. Original text: https://www.apolloresearch.ai/blog/we-need-a-science-of-evals Author(s): Apollo Research blog A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.

AI Safety Fundamentals: Alignment

1
Illustrating Reinforcement Learning from Human Feedback (RLHF) 22:32

לפני 1 year22:32

22:32

This more technical article explains the motivations for a system like RLHF, and adds additional concrete details as to how the RLHF approach is applied to neural networks. While reading, consider which parts of the technical implementation correspond to the 'values coach' and 'coherence coach' from the previous video. A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

AI Safety Fundamentals: Alignment

1
Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback 32:19

לפני 1 year32:19

32:19

This paper explains Anthropic’s constitutional AI approach, which is largely an extension on RLHF but with AIs replacing human demonstrators and human evaluators. Everything in this paper is relevant to this week's learning objectives, and we recommend you read it in its entirety. It summarises limitations with conventional RLHF, explains the constitutional AI approach, shows how it performs, and where future research might be directed. If you are in a rush, focus on sections 1.2, 3.1, 3.4, 4.1, 6.1, 6.2. A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

AI Safety Fundamentals: Alignment

לפני 1 year13:56

13:56

Right now I’m working on finding a good objective to optimize with ML, rather than trying to make sure our models are robustly optimizing that objective. (This is roughly “outer alignment.”) That’s pretty vague, and it’s not obvious whether “find a good objective” is a meaningful goal rather than being inherently confused or sweeping key distinctions under the rug. So I like to focus on a more precise special case of alignment: solve alignment when decisions are “low stakes.” I think this case effectively isolates the problem of “find a good objective” from the problem of ensuring robustness and is precise enough to focus on productively. In this post I’ll describe what I mean by the low-stakes setting, why I think it isolates this subproblem, why I want to isolate this subproblem, and why I think that it’s valuable to work on crisp subproblems. Source: https://www.alignmentforum.org/posts/TPan9sQFuPP6jgEJo/low-stakes-alignment Narrated for AI Safety Fundamentals by TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

1
Two-Turn Debate Doesn’t Help Humans Answer Hard Reading Comprehension Questions 16:39

לפני 1 year16:39

16:39

Using hard multiple-choice reading comprehension questions as a testbed, we assess whether presenting humans with arguments for two competing answer options, where one is correct and the other is incorrect, allows human judges to perform more accurately, even when one of the arguments is unreliable and deceptive. If this is helpful, we may be able to increase our justified trust in language-model-based systems by asking them to produce these arguments where needed. Previous research has shown that just a single turn of arguments in this format is not helpful to humans. However, as debate settings are characterized by a back-and-forth dialogue, we follow up on previous results to test whether adding a second round of counter-arguments is helpful to humans. We find that, regardless of whether they have access to arguments or not, humans perform similarly on our task. These findings suggest that, in the case of answering reading comprehension questions, debate is not a helpful format. Source: https://arxiv.org/abs/2210.10860 Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

1
Least-To-Most Prompting Enables Complex Reasoning in Large Language Models 16:08

לפני 1 year16:08

16:08

לפני 3 years16:08

16:08

AI Safety Fundamentals: Alignment

1
Introduction to Logical Decision Theory for Computer Scientists 14:28

לפני 3 years14:28

14:28

Decision theories differ on exactly how to calculate the expectation--the probability of an outcome, conditional on an action. This foundational difference bubbles up to real-life questions about whether to vote in elections, or accept a lowball offer at the negotiating table. When you're thinking about what happens if you don't vote in an election, should you calculate the expected outcome as if only your vote changes, or as if all the people sufficiently similar to you would also decide not to vote? Questions like these belong to a larger class of problems, Newcomblike decision problems, in which some other agent is similar to us or reasoning about what we will do in the future. The central principle of 'logical decision theories', several families of which will be introduced, is that we ought to choose as if we are controlling the logical output of our abstract decision algorithm. Newcomblike considerations--which might initially seem like unusual special cases--become more prominent as agents can get higher-quality information about what algorithms or policies other agents use: Public commitments, machine agents with known code, smart contracts running on Ethereum. Newcomblike considerations also become more important as we deal with agents that are very similar to one another; or with large groups of agents that are likely to contain high-similarity subgroups; or with problems where even small correlations are enough to swing the decision. In philosophy, the debate over decision theories is seen as a debate over the principle of rational choice. Do 'rational' agents refrain from voting in elections, because their one vote is very unlikely to change anything? Do we need to go beyond 'rationality', into 'social rationality' or 'superrationality' or something along those lines, in order to describe agents that could possibly make up a functional society? Original text: https://arbital.com/p/logical_dt/?l=5d6 Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

AI Safety Fundamentals: Alignment

1
AI Safety via Debate 39:49

לפני 3 years39:49

39:49

Abstract: To make AI systems broadly useful for challenging real-world tasks, we need them to learn complex human goals and preferences. One approach to specifying complex goals asks humans to judge during training which agent behaviors are safe and useful, but this approach can fail if the task is too complicated for a human to directly judge. To help address this concern, we propose training agents via self play on a zero sum debate game. Given a question or proposed action, two agents take turns making short statements up to a limit, then a human judges which of the agents gave the most true, useful information. In an analogy to complexity theory, debate with optimal play can answer any question in PSPACE given polynomial time judges (direct judging answers only NP questions). In practice, whether debate works involves empirical questions about humans and the tasks we want AIs to perform, plus theoretical questions about the meaning of AI alignment. We report results on an initial MNIST experiment where agents compete to convince a sparse classifier, boosting the classifier's accuracy from 59.4% to 88.9% given 6 pixels and from 48.2% to 85.2% given 4 pixels. Finally, we discuss theoretical and practical aspects of the debate model, focusing on potential weaknesses as the model scales up, and we propose future human and computer experiments to test these properties. Original text: https://arxiv.org/abs/1805.00899 Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

AI Safety Fundamentals: Alignment

1
AI Safety via Debatered Teaming Language Models With Language Models 6:47

לפני 3 years6:47

6:47

Abstract: Language Models (LMs) often cannot be deployed because of their potential to harm users in ways that are hard to predict in advance. Prior work identifies harmful behaviors before deployment by using human annotators to hand-write test cases. However, human annotation is expensive, limiting the number and diversity of test cases. In this work, we automatically find cases where a target LM behaves in a harmful way, by generating test cases (“red teaming”) using another LM. We evaluate the target LM’s replies to generated test questions using a classifier trained to detect offensive content, uncovering tens of thousands of offensive replies in a 280B parameter LM chatbot. We explore several methods, from zero-shot generation to reinforcement learning, for generating test cases with varying levels of diversity and difficulty. Furthermore, we use prompt engineering to control LM-generated test cases to uncover a variety of other harms, automatically finding groups of people that the chatbot discusses in offensive ways, personal and hospital phone numbers generated as the chatbot’s own contact info, leakage of private training data in generated text, and harms that occur over the course of a conversation. Overall, LM-based red teaming is one promising tool (among many needed) for finding and fixing diverse, undesirable LM behaviors before impacting users. Original text: https://www.deepmind.com/publications/red-teaming-language-models-with-language-models Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

AI Safety Fundamentals: Alignment

1
Robust Feature-Level Adversaries Are Interpretability Tools 35:33

לפני 3 years35:33

35:33

Abstract: The literature on adversarial attacks in computer vision typically focuses on pixel-level perturbations. These tend to be very difficult to interpret. Recent work that manipulates the latent representations of image generators to create "feature-level" adversarial perturbations gives us an opportunity to explore perceptible, interpretable adversarial attacks. We make three contributions. First, we observe that feature-level attacks provide useful classes of inputs for studying representations in models. Second, we show that these adversaries are uniquely versatile and highly robust. We demonstrate that they can be used to produce targeted, universal, disguised, physically-realizable, and black-box attacks at the ImageNet scale. Third, we show how these adversarial images can be used as a practical interpretability tool for identifying bugs in networks. We use these adversaries to make predictions about spurious associations between features and classes which we then test by designing "copy/paste" attacks in which one natural image is pasted into another to cause a targeted misclassification. Our results suggest that feature-level attacks are a promising approach for rigorous interpretability research. Original text: https://arxiv.org/abs/2110.03605 Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

AI Safety Fundamentals: Alignment

1
Debate Update: Obfuscated Arguments Problem 28:30

לפני 3 years28:30

28:30

This is an update on the work on AI Safety via Debate that we previously wrote about here . What we did: We tested the debate protocol introduced in AI Safety via Debate with human judges and debaters. We found various problems and improved the mechanism to fix these issues (details of these are in the appendix). However, we discovered that a dishonest debater can often create arguments that have a fatal error, but where it is very hard to locate the error. We don’t have a fix for this “obfuscated argument” problem, and believe it might be an important quantitative limitation for both IDA and Debate. Key takeaways and relevance for alignment: Our ultimate goal is to find a mechanism that allows us to learn anything that a machine learning model knows: if the model can efficiently find the correct answer to some problem, our mechanism should favor the correct answer while only requiring a tractable number of human judgements and a reasonable number of computation steps for the model. We’re working under a hypothesis that there are broadly two ways to know things: via step-by-step reasoning about implications (logic, computation…), and by learning and generalizing from data (pattern matching, bayesian updating…). Original text: https://www.alignmentforum.org/posts/PJLABqQ962hZEqhdB/debate-update-obfuscated-arguments-problem Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

AI Safety Fundamentals: Alignment

1
High-Stakes Alignment via Adversarial Training [Redwood Research Report] 19:15

לפני 3 years19:15

19:15

(Update: We think the tone of this post was overly positive considering our somewhat weak results. You can read our latest post with more takeaways and followup results here.) This post motivates and summarizes this paper from Redwood Research, which presents results from the project first introduced here. We used adversarial training to improve high-stakes reliability in a task (“filter all injurious continuations of a story”) that we think is analogous to work that future AI safety engineers will need to do to reduce the risk of AI takeover. We experimented with three classes of adversaries – unaugmented humans, automatic paraphrasing, and humans augmented with a rewriting tool – and found that adversarial training was able to improve robustness to these three adversaries without affecting in-distribution performance. We think this work constitutes progress towards techniques that may substantially reduce the likelihood of deceptive alignment. Motivation Here are two dimensions along which you could simplify the alignment problem (similar to the decomposition at the top of this post): 1. Low-stakes (but difficult to oversee): Only consider domains where each decision that an AI makes is low-stakes, so no single action can have catastrophic consequences. In this setting, the key challenge is to correctly oversee the actions that AIs take, such that humans remain in control over time. 2. Easy oversight (but high-stakes): Only consider domains where overseeing AI behavior is easy, meaning that it is straightforward to run an oversight process that can assess the goodness of any particular action. Source: https://www.alignmentforum.org/posts/A9tJFJY7DsGTFKKkh/high-stakes-alignment-via-adversarial-training-redwood Narrated for AI Safety Fundamentals by Perrin Walker of TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

AI Safety Fundamentals: Alignment

1
Takeaways From Our Robust Injury Classifier Project [Redwood Research] 12:01

לפני 3 years12:01

12:01

With the benefit of hindsight, we have a better sense of our takeaways from our first adversarial training project (paper). Our original aim was to use adversarial training to make a system that (as far as we could tell) never produced injurious completions. If we had accomplished that, we think it would have been the first demonstration of a deep learning system avoiding a difficult-to-formalize catastrophe with an ultra-high level of reliability. Presumably, we would have needed to invent novel robustness techniques that could have informed techniques useful for aligning TAI. With a successful system, we also could have performed ablations to get a clear sense of which building blocks were most important. Alas, we fell well short of that target. We still saw failures when just randomly sampling prompts and completions. Our adversarial training didn’t reduce the random failure rate, nor did it eliminate highly egregious failures (example below). We also don’t think we've successfully demonstrated a negative result, given that our results could be explained by suboptimal choices in our training process. Overall, we’d say this project had value as a learning experience but produced much less alignment progress than we hoped. Source: https://www.alignmentforum.org/posts/n3LAgnHg6ashQK3fF/takeaways-from-our-robust-injury-classifier-project-redwood Narrated for AI Safety Fundamentals by TYPE III AUDIO . --- A podcast by BlueDot Impact . Learn more on the AI Safety Fundamentals website.…

ברוכים הבאים אל Player FM!

Player FM סורק את האינטרנט עבור פודקאסטים באיכות גבוהה בשבילכם כדי שתהנו מהם כרגע. זה יישום הפודקאסט הטוב ביותר והוא עובד על אנדרואיד, iPhone ואינטרנט. הירשמו לסנכרון מנויים במכשירים שונים.