TalkRL podcast is All Reinforcement Learning, All the Time.
In-depth interviews with brilliant people at the forefront of RL research and practice.
Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute.
Hosted by Robin Ranjit Singh Chauhan.
The podcast TalkRL: The Reinforcement Learning Podcast is created by Robin Ranjit Singh Chauhan. The podcast and the artwork on this page are embedded on this page using the public podcast feed (RSS).
Posters and Hallway episodes are short interviews and poster summaries. Recorded at RLC 2024 in Amherst MA.
Featuring:
Posters and Hallway episodes are short interviews and poster summaries. Recorded at RLC 2024 in Amherst MA.
Featuring:
Posters and Hallway episodes are short interviews and poster summaries. Recorded at RLC 2024 in Amherst MA.
Featuring:
Posters and Hallway episodes are short interviews and poster summaries. Recorded at RLC 2024 in Amherst MA.
Featuring:
Posters and Hallway episodes are short interviews and poster summaries. Recorded at RLC 2024 in Amherst MA.
Featuring:
Finale Doshi-Velez is a Professor at the Harvard Paulson School of Engineering and Applied Sciences.
This off-the-cuff interview was recorded at UMass Amherst during the workshop day of RL Conference on August 9th 2024.
Host notes: I've been a fan of some of Prof Doshi-Velez' past work on clinical RL and hoped to feature her for some time now, so I jumped at the chance to get a few minutes of her thoughts -- even though you can tell I was not prepared and a bit flustered tbh. Thanks to Prof Doshi-Velez for taking a moment for this, and I hope to cross paths in future for a more in depth interview.
References
Thanks to Professor Silver for permission to record this discussion after his RLC 2024 keynote lecture.
Recorded at UMass Amherst during RCL 2024.
Due to the live recording environment, audio quality varies. We publish this audio in its raw form to preserve the authenticity and immediacy of the discussion.
References
David Silver is a principal research scientist at DeepMind and a professor at University College London.
This interview was recorded at UMass Amherst during RLC 2024.
References
Dr. Vincent Moens is an Applied Machine Learning Research Scientist at Meta, and an author of TorchRL and TensorDict in pytorch.
Featured References
TorchRL: A data-driven decision-making library for PyTorch
Albert Bou, Matteo Bettini, Sebastian Dittert, Vikash Kumar, Shagun Sodhani, Xiaomeng Yang, Gianni De Fabritiis, Vincent Moens
Additional References
Arash Ahmadian is a Researcher at Cohere and Cohere For AI focussed on Preference Training of large language models. He’s also a researcher at the Vector Institute of AI.
Featured Reference
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
Arash Ahmadian, Chris Cremer, Matthias Gallé, Marzieh Fadaee, Julia Kreutzer, Olivier Pietquin, Ahmet Üstün, Sara Hooker
Additional References
Glen Berseth is an assistant professor at the Université de Montréal, a core academic member of the Mila - Quebec AI Institute, a Canada CIFAR AI chair, member l'Institute Courtios, and co-director of the Robotics and Embodied AI Lab (REAL).
Featured Links
Reinforcement Learning Conference
Closing the Gap between TD Learning and Supervised Learning--A Generalisation Point of View
Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach
Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty.
We spoke about:
- Information theory and RL
- Exploration, epistemic uncertainty and joint predictions
- Epistemic Neural Networks and scaling to LLMs
Featured References
Reinforcement Learning, Bit by Bit
Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen
From Predictions to Decisions: The Importance of Joint Predictive Distributions
Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy
Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy
Approximate Thompson Sampling via Epistemic Neural Networks
Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy
Additional References
Sharath Chandra Raparthy on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more!
Sharath Chandra Raparthy is an AI Resident at FAIR at Meta, and did his Master's at Mila.
Featured Reference
Generalization to New Sequential Decision Making Tasks with In-Context Learning
Sharath Chandra Raparthy , Eric Hambro, Robert Kirk , Mikael Henaff, , Roberta Raileanu
Additional References
Pierluca D'Oro and Martin Klissarov on Motif and RLAIF, Noisy Neighborhoods and Return Landscapes, and more!
Pierluca D'Oro is PhD student at Mila and visiting researcher at Meta.
Martin Klissarov is a PhD student at Mila and McGill and research scientist intern at Meta.
Featured References
Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Martin Klissarov*, Pierluca D'Oro*, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff
Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
Nate Rahn*, Pierluca D'Oro*, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare
To keep doing RL research, stop calling yourself an RL researcher
Pierluca D'Oro
Martin Riedmiller of Google DeepMind on controlling nuclear fusion plasma in a tokamak with RL, the original Deep Q-Network, Neural Fitted Q-Iteration, Collect and Infer, AGI for control systems, and tons more!
Martin Riedmiller is a research scientist and team lead at DeepMind.
Featured References
Magnetic control of tokamak plasmas through deep reinforcement learning
Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de las Casas, Craig Donner, Leslie Fritz, Cristian Galperti, Andrea Huber, James Keeling, Maria Tsimpoukelli, Jackie Kay, Antoine Merle, Jean-Marc Moret, Seb Noury, Federico Pesamosca, David Pfau, Olivier Sauter, Cristian Sommariva, Stefano Coda, Basil Duval, Ambrogio Fasoli, Pushmeet Kohli, Koray Kavukcuoglu, Demis Hassabis & Martin Riedmiller
Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis
Neural fitted Q iteration–first experiences with a data efficient neural reinforcement learning method
Martin Riedmiller
Max Schwarzer is a PhD student at Mila, with Aaron Courville and Marc Bellemare, interested in RL scaling, representation learning for RL, and RL for science. Max spent the last 1.5 years at Google Brain/DeepMind, and is now at Apple Machine Learning Research.
Featured References
Bigger, Better, Faster: Human-level Atari with human-level efficiency
Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro
Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier
Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville
The Primacy Bias in Deep Reinforcement Learning
Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville
Additional References
Julian Togelius is an Associate Professor of Computer Science and Engineering at NYU, and Cofounder and research director at modl.ai
Featured References
Choose Your Weapon: Survival Strategies for Depressed AI Academics
Julian Togelius, Georgios N. Yannakakis
Learning Controllable 3D Level Generators
Zehua Jiang, Sam Earle, Michael Cerny Green, Julian Togelius
PCGRL: Procedural Content Generation via Reinforcement Learning
Ahmed Khalifa, Philip Bontrager, Sam Earle, Julian Togelius
Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation
Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, Sebastian Risi
Jakob Foerster on Multi-Agent learning, Cooperation vs Competition, Emergent Communication, Zero-shot coordination, Opponent Shaping, agents for Hanabi and Prisoner's Dilemma, and more.
Jakob Foerster is an Associate Professor at University of Oxford.
Featured References
Learning with Opponent-Learning Awareness
Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch
Model-Free Opponent Shaping
Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster
Off-Belief Learning
Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster
Learning to Communicate with Deep Multi-Agent Reinforcement Learning
Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson
Adversarial Cheap Talk
Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster
Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning
Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson
Additional References
Danijar Hafner on the DreamerV3 agent and world models, the Director agent and heirarchical RL, realtime RL on robots with DayDreamer, and his framework for unsupervised agent design!
Danijar Hafner is a PhD candidate at the University of Toronto with Jimmy Ba, a visiting student at UC Berkeley with Pieter Abbeel, and an intern at DeepMind. He has been our guest before back on episode 11.
Featured References
Mastering Diverse Domains through World Models [ blog ] DreaverV3
Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap
DayDreamer: World Models for Physical Robot Learning [ blog ]
Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel
Deep Hierarchical Planning from Pixels [ blog ]
Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel
Action and Perception as Divergence Minimization [ blog ]
Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess
Additional References
AI Generating Algos, Learning to play Minecraft with Video PreTraining (VPT), Go-Explore for hard exploration, POET and Open Endedness, AI-GAs and ChatGPT, AGI predictions, and lots more!
Professor Jeff Clune is Associate Professor of Computer Science at University of British Columbia, a Canada CIFAR AI Chair and Faculty Member at Vector Institute, and Senior Research Advisor at DeepMind.
Featured References
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos [ Blog Post ]
Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune
Robots that can adapt like animals
Antoine Cully, Jeff Clune, Danesh Tarapore, Jean-Baptiste Mouret
Illuminating search spaces by mapping elites
Jean-Baptiste Mouret, Jeff Clune
Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions
Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley
Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions
Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley
First return, then explore
Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune
Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!
Dr Natasha Jaques is a Senior Research Scientist at Google Brain.
Featured References
Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard
Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck
PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar
Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience
Marwa Abdulhai, Natasha Jaques, Sergey Levine
Additional References
Jacob Beck and Risto Vuorio on their recent Survey of Meta-Reinforcement Learning. Jacob and Risto are Ph.D. students at Whiteson Research Lab at University of Oxford.
Featured Reference
A Survey of Meta-Reinforcement Learning
Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson
Additional References
John Schulman is a cofounder of OpenAI, and currently a researcher and engineer at OpenAI.
Featured References
WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman
Training language models to follow instructions with human feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe
Additional References
Sven Mika is the Reinforcement Learning Team Lead at Anyscale, and lead committer of RLlib. He holds a PhD in biomathematics, bioinformatics, and computational biology from Witten/Herdecke University.
Featured References
RLlib Documentation: RLlib: Industry-Grade Reinforcement Learning
Ray: Documentation
RLlib: Abstractions for Distributed Reinforcement Learning
Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica
Episode sponsor: Anyscale
Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.
Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.
Karol Hausman is a Senior Research Scientist at Google Brain and an Adjunct Professor at Stanford working on robotics and machine learning. Karol is interested in enabling robots to acquire general-purpose skills with minimal supervision in real-world environments.
Fei Xia is a Research Scientist with Google Research. Fei Xia is mostly interested in robot learning in complex and unstructured environments. Previously he has been approaching this problem by learning in realistic and scalable simulation environments (GibsonEnv, iGibson). Most recently, he has been exploring using foundation models for those challenges.
Featured References
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [ website ]
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan
Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter
Additional References
Episode sponsor: Anyscale
Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.
Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.
Saikrishna Gottipati is an RL Researcher at AI Redefined, working on RL, MARL, human in the loop learning.
Featured References
Cogment: Open Source Framework For Distributed Multi-actor Training, Deployment & Operations
AI Redefined, Sai Krishna Gottipati, Sagar Kurandwad, Clodéric Mars, Gregory Szriftgiser, François Chabot
Do As You Teach: A Multi-Teacher Approach to Self-Play in Deep Reinforcement Learning
Currently under review
Learning to navigate the synthetically accessible chemical space using reinforcement learning
Sai Krishna Gottipati, Boris Sattarov, Sufeng Niu, Yashaswi Pathak, Haoran Wei, Shengchao Liu, Karam J. Thomas, Simon Blackburn, Connor W. Coley, Jian Tang, Sarath Chandar, Yoshua Bengio
Additional References
Episode sponsor: Anyscale
Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.
Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.
Aravind Srinivas is back! He is now a research Scientist at OpenAI.
Featured References
Decision Transformer: Reinforcement Learning via Sequence Modeling
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch
VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas
Dr. Rohin Shah is a Research Scientist at DeepMind, and the editor and main contributor of the Alignment Newsletter.
Featured References
The MineRL BASALT Competition on Learning from Human Feedback
Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan
Preferences Implicit in the State of the World
Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan
Benefits of Assistance over Reward Learning
Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell
On the Utility of Learning about Humans for Human-AI Coordination
Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan
Evaluating the Robustness of Collaborative Agents
Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah
Additional References
Jordan Terry is a PhD candidate at University of Maryland, the maintainer of Gym, the maintainer and creator of PettingZoo and the founder of Swarm Labs.
Featured References
PettingZoo: Gym for Multi-Agent Reinforcement Learning
J. K. Terry, Benjamin Black, Nathaniel Grammel, Mario Jayakumar, Ananth Hari, Ryan Sullivan, Luis Santos, Rodrigo Perez, Caroline Horsch, Clemens Dieffendahl, Niall L. Williams, Yashas Lokesh, Praveen Ravi
Additional References
Robert Tjarko Lange is a PhD student working at the Technical University Berlin.
Featured References
Learning not to learn: Nature versus nurture in silico
Lange, R. T., & Sprekeler, H. (2020)
On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning
Vischer, M. A., Lange, R. T., & Sprekeler, H. (2021).
Semantic RL with Action Grammars: Data-Efficient Learning of Hierarchical Task Abstractions
Lange, R. T., & Faisal, A. (2019).
MLE-Infrastructure on Github
Additional References
We hear about the idea of PERLS and why its important to talk about.
Amy Zhang is a postdoctoral scholar at UC Berkeley and a research scientist at Facebook AI Research. She will be starting as an assistant professor at UT Austin in Spring 2023.
Featured References
Invariant Causal Prediction for Block MDPs
Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, Doina Precup
Multi-Task Reinforcement Learning with Context-based Representations
Shagun Sodhani, Amy Zhang, Joelle Pineau
MBRL-Lib: A Modular Library for Model-based Reinforcement Learning
Luis Pineda, Brandon Amos, Amy Zhang, Nathan O. Lambert, Roberto Calandra
Additional References
Xianyuan Zhan is currently a research assistant professor at the Institute for AI Industry Research (AIR), Tsinghua University. He received his Ph.D. degree at Purdue University. Before joining Tsinghua University, Dr. Zhan worked as a researcher at Microsoft Research Asia (MSRA) and a data scientist at JD Technology. At JD Technology, he led the research that uses offline RL to optimize real-world industrial systems.
Featured References
DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning
Xianyuan Zhan, Haoran Xu, Yue Zhang, Yusen Huo, Xiangyu Zhu, Honglei Yin, Yu Zheng
Eugene Vinitsky is a PhD student at UC Berkeley advised by Alexandre Bayen. He has interned at Tesla and Deepmind.
Featured References
A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings
Eugene Vinitsky, Raphael Köster, John P. Agapiou, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, Joel Z. Leibo
Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL
Eugene Vinitsky, Nathan Lichtle, Kanaad Parvate, Alexandre Bayen
Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion
Eugene Vinitsky; Kanaad Parvate; Aboudy Kreidieh; Cathy Wu; Alexandre Bayen 2018
The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games
Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu
Additional References
Dr. Jess Whittlestone is a Senior Research Fellow at the Centre for the Study of Existential Risk and the Leverhulme Centre for the Future of Intelligence, both at the University of Cambridge.
Featured References
The Societal Implications of Deep Reinforcement Learning
Jess Whittlestone, Kai Arulkumaran, Matthew Crosby
Artificial Canaries: Early Warning Signs for Anticipatory and Democratic Governance of AI
Carla Zoe Cremer, Jess Whittlestone
Additional References
Dr Aleksandra Faust is a Staff Research Scientist and Reinforcement Learning research team co-founder at Google Brain Research.
Featured References
Reinforcement Learning and Planning for Preference Balancing Tasks
Faust 2014
Learning Navigation Behaviors End-to-End with AutoRL
Hao-Tien Lewis Chiang, Aleksandra Faust, Marek Fiser, Anthony Francis
Evolving Rewards to Automate Reinforcement Learning
Aleksandra Faust, Anthony Francis, Dar Mehta
Evolving Reinforcement Learning Algorithms
John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust
Adversarial Environment Generation for Learning to Navigate the Web
Izzeddin Gur, Natasha Jaques, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust
Additional References
Sam Ritter is a Research Scientist on the neuroscience team at DeepMind.
Featured References
Unsupervised Predictive Memory in a Goal-Directed Agent (MERLIN)
Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap
Meta-RL without forgetting: Been There, Done That: Meta-Learning with Episodic Recall
Samuel Ritter, Jane X. Wang, Zeb Kurth-Nelson, Siddhant M. Jayakumar, Charles Blundell, Razvan Pascanu, Matthew Botvinick
Meta-Reinforcement Learning with Episodic Recall: An Integrative Theory of Reward-Driven Learning
Samuel Ritter 2019
Meta-RL exploration and planning: Rapid Task-Solving in Novel Environments
Sam Ritter, Ryan Faulkner, Laurent Sartran, Adam Santoro, Matt Botvinick, David Raposo
Synthetic Returns for Long-Term Credit Assignment
David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song
Additional References
Thomas Krendl Gilbert is a PhD student at UC Berkeley’s Center for Human-Compatible AI, specializing in Machine Ethics and Epistemology.
Featured References
Hard Choices in Artificial Intelligence: Addressing Normative Uncertainty through Sociotechnical Commitments
Roel Dobbe, Thomas Krendl Gilbert, Yonatan Mintz
Mapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous Vehicles
Thomas Krendl Gilbert
AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks
McKane Andrus, Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert and Tom Zick
Additional References
Professor Marc G. Bellemare is a Research Scientist at Google Research (Brain team), An Adjunct Professor at McGill University, and a Canada CIFAR AI Chair.
Featured References
The Arcade Learning Environment: An Evaluation Platform for General Agents
Marc G. Bellemare, Yavar Naddaf, Joel Veness, Michael Bowling
Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg & Demis Hassabis
Autonomous navigation of stratospheric balloons using reinforcement learning
Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda & Ziyu Wang
Additional References
Robert Osazuwa Ness is an adjunct professor of computer science at Northeastern University, an ML Research Engineer at Gamalon, and the founder of AltDeep School of AI. He holds a PhD in statistics. He studied at Johns Hopkins SAIS and then Purdue University.
References
Dr. Marlos C. Machado is a research scientist at DeepMind and an adjunct professor at the University of Alberta. He holds a PhD from the University of Alberta and a MSc and BSc from UFMG, in Brazil.
Featured References
Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew J. Hausknecht, Michael Bowling
Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning [ video ]
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare
Efficient Exploration in Reinforcement Learning through Time-Based Representations
Marlos C. Machado
A Laplacian Framework for Option Discovery in Reinforcement Learning [ video ]
Marlos C. Machado, Marc G. Bellemare, Michael H. Bowling
Eigenoption Discovery through the Deep Successor Representation
Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell
Exploration in Reinforcement Learning with Deep Covering Options
Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Dimitri Konidaris
Autonomous navigation of stratospheric balloons using reinforcement learning
Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda & Ziyu Wang
Generalization and Regularization in DQN
Jesse Farebrother, Marlos C. Machado, Michael Bowling
Additional References
Nathan Lambert is a PhD Candidate at UC Berkeley.
Featured References
Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning
Nathan O. Lambert, Albert Wilcox, Howard Zhang, Kristofer S. J. Pister, Roberto Calandra
Objective Mismatch in Model-based Reinforcement Learning
Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra
Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning
Nathan O. Lambert, Daniel S. Drew, Joseph Yaconelli, Roberto Calandra, Sergey Levine, Kristofer S.J. Pister
On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra
Additional References
Kai Arulkumaran is a researcher at Araya in Tokyo.
Featured References
AlphaStar: An Evolutionary Computation Perspective
Kai Arulkumaran, Antoine Cully, Julian Togelius
Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation
Tianhong Dai, Kai Arulkumaran, Tamara Gerbert, Samyakh Tukra, Feryal Behbahani, Anil Anthony Bharath
Training Agents using Upside-Down Reinforcement Learning
Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Jaśkowski, Jürgen Schmidhuber
Additional References
Michael Dennis is a PhD student at the Center for Human-Compatible AI at UC Berkeley, supervised by Professor Stuart Russell.
I'm interested in robustness in RL and multi-agent RL, specifically as it applies to making the interaction between AI systems and society at large to be more beneficial.--Michael Dennis
Featured References
Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design [PAIRED]
Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine
Videos
Adversarial Policies: Attacking Deep Reinforcement Learning
Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell
Homepage and Videos
Accumulating Risk Capital Through Investing in Cooperation
Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell
Quantifying Differences in Reward Functions [EPIC]
Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike
Additional References
Roman Ring is a Research Engineer at DeepMind.
Featured References
Grandmaster level in StarCraft II using multi-agent reinforcement learning
Vinyals et al, 2019
Replicating DeepMind StarCraft II Reinforcement Learning Benchmark with Actor-Critic Methods
Roman Ring, 2018
Additional References
Shimon Whiteson is a Professor of Computer Science at Oxford University, the head of WhiRL, the Whiteson Research Lab at Oxford, and Head of Research at Waymo UK.
Featured References
VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning
Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson
Additional References
Aravind Srinivas is a 3rd year PhD student at UC Berkeley advised by Prof. Abbeel.
He co-created and co-taught a grad course on Deep Unsupervised Learning at Berkeley.
Featured References
Data-Efficient Image Recognition with Contrastive Predictive Coding
Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord
Contrastive Unsupervised Representations for Reinforcement Learning
Aravind Srinivas, Michael Laskin, Pieter Abbeel
Reinforcement Learning with Augmented Data
Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, Aravind Srinivas
SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning
Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel
Additional References
Taylor Killian is a Ph.D. student at the University of Toronto and the Vector Institute, and an Intern at Google Brain.
Featured References
Direct Policy Transfer with Hidden Parameter Markov Decision Processes
Yao, Killian, Konidaris, Doshi-Velez
Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes
Killian, Daulton, Konidaris, Doshi-Velez
Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes
Killian, Konidaris, Doshi-Velez
Counterfactually Guided Policy Transfer in Clinical Settings
Killian, Ghassemi, Joshi
Additional References
Nan Jiang is an Assistant Professor of Computer Science at University of Illinois. He was a Postdoc Microsoft Research, and did his PhD at University of Michigan under Professor Satinder Singh.
Featured References
Additional References
Errata
Danijar Hafner is a PhD student at the University of Toronto, and a student researcher at Google Research, Brain Team and the Vector Institute. He holds a Masters of Research from University College London.
Featured References
Additional References
Errata
Csaba Szepesvari is:
References
Ben Eysenbach is a PhD student in the Machine Learning Department at Carnegie Mellon University. He was a Resident at Google Brain, and studied math and computer science at MIT. He co-founded the ICML Exploration in Reinforcement Learning workshop.
Featured References
Diversity is All You Need: Learning Skills without a Reward Function
Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine
Search on the Replay Buffer: Bridging Planning and Reinforcement Learning
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine
Additional References
Thank you to all the presenters that participated. I covered as many as I could given the time and crowds, if you were not included and wish to be, please email [email protected]
More details on the official NeurIPS Deep RL Workshop site.
Scott Fujimoto is a PhD student at McGill University and Mila. He is the author of TD3 as well as some of the recent developments in batch deep reinforcement learning.
Featured References
Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto, Herke van Hoof, David Meger
Off-Policy Deep Reinforcement Learning without Exploration
Scott Fujimoto, David Meger, Doina Precup
Benchmarking Batch Deep Reinforcement Learning Algorithms
Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, Joelle Pineau
Additional References
Dr. Jessica Hamrick is a Research Scientist at DeepMind. She holds a PhD in Psychology from UC Berkeley.
Featured References
Structured agents for physical construction
Victor Bapst, Alvaro Sanchez-Gonzalez, Carl Doersch, Kimberly L. Stachenfeld, Pushmeet Kohli, Peter W. Battaglia, Jessica B. Hamrick
Analogues of mental simulation and imagination in deep learning
Jessica Hamrick
Additional References
Dr Pablo Samuel Castro is a Staff Research Software Engineer at Google Brain. He is the main author of the Dopamine RL framework.
Featured References
A Comparative Analysis of Expected and Distributional Reinforcement Learning
Clare Lyle, Pablo Samuel Castro, Marc G. Bellemare
A Geometric Perspective on Optimal Representations for Reinforcement Learning
Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle
Dopamine: A Research Framework for Deep Reinforcement Learning
Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, Marc G. Bellemare
Dopamine RL framework on github
Tensorflow Agents on github
Additional References
Dr. Kamyar Azizzadenesheli is a post-doctorate scholar at Caltech. His research interest is mainly in the area of Machine Learning, from theory to practice, with the main focus in Reinforcement Learning. He will be joining Purdue University as an Assistant CS Professor in Fall 2020.
Featured References
Efficient Exploration through Bayesian Deep Q-Networks
Kamyar Azizzadenesheli, Animashree Anandkumar
Surprising Negative Results for Generative Adversarial Tree Search
Kamyar Azizzadenesheli, Brandon Yang, Weitang Liu, Zachary C Lipton, Animashree Anandkumar
Maybe a few considerations in Reinforcement Learning Research?
Kamyar Azizzadenesheli
Additional References
Antonin Raffin is a researcher at the German Aerospace Center (DLR) in Munich, working in the Institute of Robotics and Mechatronics. His research is on using machine learning for controlling real robots (because simulation is not enough), with a particular interest for reinforcement learning.
Ashley Hill is doing his thesis on improving control algorithms using machine learning for real time gain tuning.
He works mainly with neuroevolution, genetic algorithms, and of course reinforcement learning, applied to mobile robots. He holds a masters degree in Machine learning, and a bachelors in Computer science from the Université Paris-Saclay.
Featured References
stable-baselines on github
Ashley Hill, Antonin Raffin primary authors.
S-RL Toolbox
Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat
Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics
Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat
Additional References
Michael L Littman is a professor of Computer Science at Brown University. He was elected ACM Fellow in 2018 "For contributions to the design and analysis of sequential decision making algorithms in artificial intelligence".
Featured References
Convergent Actor Critic by Humans
James MacGlashan, Michael L. Littman, David L. Roberts, Robert Tyler Loftin, Bei Peng, Matthew E. Taylor
People teach with rewards and punishments as communication, not reinforcements
Mark Ho, Fiery Cushman, Michael L. Littman, Joseph Austerweil
Theory of Minds: Understanding Behavior in Groups Through Inverse Planning
Michael Shum, Max Kleiman-Weiner, Michael L. Littman, Joshua B. Tenenbaum
Personalized education at scale
Saarinen, Cater, Littman
Additional References
Natasha Jaques is a PhD candidate at MIT working on affective and social intelligence. She has interned with DeepMind and Google Brain, and was an OpenAI Scholars mentor. Her paper “Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning” received an honourable mention for best paper at ICML 2019.
Featured References
Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas
Tackling climate change with Machine Learning
David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio
Additional References
August 2, 2019
Transcript
The idea with TalkRL Podcast is to hear from brilliant folks from across the world of Reinforcement Learning, both research and applications. As much as possible, I want to hear from them in their own language. I try to get to know as much as I can about their work before hand.
And Im not here to convert anyone, I want to reach people who are already into RL. So we wont stop to explain what a value function is, for example. Though we also wont assume everyone has read the very latest papers.
Why am I doing this? Because it’s a great way to learn from the most inspiring people in the field! There’s so much happening in the universe of RL, and there’s tons of interesting angles and so many fascinating minds to learn from.
Now I know there is no shortage of books, papers, and lectures, but so much goes unsaid.
I mean I guess if you work at MILA or AMII or Vector Institute, you might be having these conversations over coffee all the time, but I live in a little village in the woods in BC, so for me, these remote interviews are like a great way to have these conversations, and I hope sharing with the community makes it more worthwhile for everyone.
In terms of format, the first 2 episodes were interviews in longer form, around an hour long. Going forward, some may be a lot shorter, it depends on the guest.
If you want want to be a guest or suggest a guest, goto talkrl.com/about, you will find a link to a suggestion form.
Thanks for listening!
En liten tjänst av I'm With Friends. Finns även på engelska.