Arash Ahmadian on Rethinking RLHF

TalkRL: The Reinforcement Learning Podcast

תוכן מסופק על ידי Robin Ranjit Singh Chauhan. כל תוכן הפודקאסטים כולל פרקים, גרפיקה ותיאורי פודקאסטים מועלים ומסופקים ישירות על ידי Robin Ranjit Singh Chauhan או שותף פלטפורמת הפודקאסט שלהם. אם אתה מאמין שמישהו משתמש ביצירה שלך המוגנת בזכויות יוצרים ללא רשותך, אתה יכול לעקוב אחר התהליך המתואר כאן https://he.player.fm/legal.

1+ y ago 33:30

MP3•בית הפרקים

Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.

Featured Reference

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker

Additional References

Self-Rewarding Language Models, Yuan et al 2024
Reinforcement Learning: An Introduction, Sutton and Barto 1992
Learning from Delayed Rewards, Chris Watkins 1989
Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Williams 1992

73 פרקים

#Reinforcement Learning #Machine Learning #Robin Ranjit Singh Chauhan #Artificial Intelligence #Tech