TalkRL: The Reinforcement Learning Podcast

TalkRL podcast is All Reinforcement Learning, All the Time. In-depth interviews with brilliant people at the forefront of RL research and practice. Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute. Hosted by Robin Ranjit Singh Chauhan.

Prenumerera

iTunes / Overcast / RSS

Webbplats

talkrl.com

Avsnitt

Glen Berseth on RL Conference

Glen Berseth is an assistant professor at the Université de Montréal, a core academic member of the Mila - Quebec AI Institute, a Canada CIFAR AI chair, member l'Institute Courtios, and co-director of the Robotics and Embodied AI Lab (REAL).

Ian Osband

Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty.

We spoke about:

- Information theory and RL

- Exploration, epistemic uncertainty and joint predictions

- Epistemic Neural Networks and scaling to LLMs

Featured References

Reinforcement Learning, Bit by Bit
Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen

From Predictions to Decisions: The Importance of Joint Predictive Distributions

Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy

Epistemic Neural Networks

Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

Approximate Thompson Sampling via Epistemic Neural Networks

Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy

Additional References

Thesis defence, Ian Osband Homepage, Ian Osband Epistemic Neural Networks at Stanford RL Forum Behaviour Suite for Reinforcement Learning, Osband et al 2019 Efficient Exploration for LLMs, Dwaracherla et al 2024

2024-03-07
Länk till avsnitt

Sharath Chandra Raparthy

Sharath Chandra Raparthy on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more!

Sharath Chandra Raparthy is an AI Resident at FAIR at Meta, and did his Master's at Mila.

Featured Reference

Generalization to New Sequential Decision Making Tasks with In-Context Learning
Sharath Chandra Raparthy , Eric Hambro, Robert Kirk , Mikael Henaff, , Roberta Raileanu

Additional References

Sharath Chandra Raparthy Homepage Human-Timescale Adaptation in an Open-Ended Task Space, Adaptive Agent Team 2023Data Distributional Properties Drive Emergent In-Context Learning in Transformers, Chan et al 2022 Decision Transformer: Reinforcement Learning via Sequence Modeling, Chen et al 2021

2024-02-12
Länk till avsnitt

Pierluca D'Oro and Martin Klissarov

Pierluca D'Oro and Martin Klissarov on Motif and RLAIF, Noisy Neighborhoods and Return Landscapes, and more!

Pierluca D'Oro is PhD student at Mila and visiting researcher at Meta.

Martin Klissarov is a PhD student at Mila and McGill and research scientist intern at Meta.

Featured References

Motif: Intrinsic Motivation from Artificial Intelligence Feedback
Martin Klissarov*, Pierluca D'Oro*, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control
Nate Rahn*, Pierluca D'Oro*, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare

To keep doing RL research, stop calling yourself an RL researcher
Pierluca D'Oro

2023-11-13
Länk till avsnitt

Martin Riedmiller

Martin Riedmiller of Google DeepMind on controlling nuclear fusion plasma in a tokamak with RL, the original Deep Q-Network, Neural Fitted Q-Iteration, Collect and Infer, AGI for control systems, and tons more!

Martin Riedmiller is a research scientist and team lead at DeepMind.

Featured References

Magnetic control of tokamak plasmas through deep reinforcement learning
Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de las Casas, Craig Donner, Leslie Fritz, Cristian Galperti, Andrea Huber, James Keeling, Maria Tsimpoukelli, Jackie Kay, Antoine Merle, Jean-Marc Moret, Seb Noury, Federico Pesamosca, David Pfau, Olivier Sauter, Cristian Sommariva, Stefano Coda, Basil Duval, Ambrogio Fasoli, Pushmeet Kohli, Koray Kavukcuoglu, Demis Hassabis & Martin Riedmiller

Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis

Neural fitted Q iteration?first experiences with a data efficient neural reinforcement learning method
Martin Riedmiller

2023-08-22
Länk till avsnitt

Max Schwarzer

Max Schwarzer is a PhD student at Mila, with Aaron Courville and Marc Bellemare, interested in RL scaling, representation learning for RL, and RL for science. Max spent the last 1.5 years at Google Brain/DeepMind, and is now at Apple Machine Learning Research.

Featured References

Bigger, Better, Faster: Human-level Atari with human-level efficiency
Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro

Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier
Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville

The Primacy Bias in Deep Reinforcement Learning
Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville

Additional References

Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al 2017 When to use parametric models in reinforcement learning? Hasselt et al 2019 Data-Efficient Reinforcement Learning with Self-Predictive Representations, Schwarzer et al 2020 Pretraining Representations for Data-Efficient Reinforcement Learning, Schwarzer et al 2021

2023-08-08
Länk till avsnitt

Julian Togelius

Julian Togelius is an Associate Professor of Computer Science and Engineering at NYU, and Cofounder and research director at modl.ai

Featured References
Choose Your Weapon: Survival Strategies for Depressed AI Academics

Julian Togelius, Georgios N. Yannakakis

Learning Controllable 3D Level Generators

Zehua Jiang, Sam Earle, Michael Cerny Green, Julian Togelius

PCGRL: Procedural Content Generation via Reinforcement Learning

Ahmed Khalifa, Philip Bontrager, Sam Earle, Julian Togelius

Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation

Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, Sebastian Risi

2023-07-25
Länk till avsnitt

Jakob Foerster

Jakob Foerster on Multi-Agent learning, Cooperation vs Competition, Emergent Communication, Zero-shot coordination, Opponent Shaping, agents for Hanabi and Prisoner's Dilemma, and more.

Jakob Foerster is an Associate Professor at University of Oxford.

Featured References

Learning with Opponent-Learning Awareness
Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch

Model-Free Opponent Shaping
Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster

Off-Belief Learning
Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster

Learning to Communicate with Deep Multi-Agent Reinforcement Learning
Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson

Adversarial Cheap Talk
Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster

Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning
Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson

Additional References

Lectures by Jakob on youtube

2023-05-08
Länk till avsnitt

Danijar Hafner 2

Danijar Hafner on the DreamerV3 agent and world models, the Director agent and heirarchical RL, realtime RL on robots with DayDreamer, and his framework for unsupervised agent design!

Danijar Hafner is a PhD candidate at the University of Toronto with Jimmy Ba, a visiting student at UC Berkeley with Pieter Abbeel, and an intern at DeepMind. He has been our guest before back on episode 11.

Featured References

Mastering Diverse Domains through World Models [ blog ] DreaverV3

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap

DayDreamer: World Models for Physical Robot Learning [ blog ]
Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel

Deep Hierarchical Planning from Pixels [ blog ]
Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel

Action and Perception as Divergence Minimization [ blog ]
Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess

Additional References

Mastering Atari with Discrete World Models [ blog ] DreaverV2 ; Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba Dream to Control: Learning Behaviors by Latent Imagination [ blog ] Dreamer ; Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Planning to Explore via Self-Supervised World Models ; Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak

2023-04-12
Länk till avsnitt

Jeff Clune

AI Generating Algos, Learning to play Minecraft with Video PreTraining (VPT), Go-Explore for hard exploration, POET and Open Endedness, AI-GAs and ChatGPT, AGI predictions, and lots more!

Professor Jeff Clune is Associate Professor of Computer Science at University of British Columbia, a Canada CIFAR AI Chair and Faculty Member at Vector Institute, and Senior Research Advisor at DeepMind.

Featured References

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos [ Blog Post ]
Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune

Robots that can adapt like animals
Antoine Cully, Jeff Clune, Danesh Tarapore, Jean-Baptiste Mouret

Illuminating search spaces by mapping elites
Jean-Baptiste Mouret, Jeff Clune

Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions
Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley

Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions
Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley

First return, then explore
Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune

2023-03-27
Länk till avsnitt

Natasha Jaques 2

Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more!

Dr Natasha Jaques is a Senior Research Scientist at Google Brain.

Featured References

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck

PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar

Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience
Marwa Abdulhai, Natasha Jaques, Sergey Levine

Additional References

Fine-Tuning Language Models from Human Preferences, Daniel M. Ziegler et al 2019 Learning to summarize from human feedback, Nisan Stiennon et al 2020 Training language models to follow instructions with human feedback, Long Ouyang et al 2022

2023-03-14
Länk till avsnitt

Jacob Beck and Risto Vuorio

Jacob Beck and Risto Vuorio on their recent Survey of Meta-Reinforcement Learning. Jacob and Risto are Ph.D. students at Whiteson Research Lab at University of Oxford.

Featured Reference

A Survey of Meta-Reinforcement Learning
Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson

Additional References

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning, Luisa Zintgraf et al Mastering Diverse Domains through World Models (Dreamerv3), Hafner et al Unsupervised Meta-Learning for Reinforcement Learning (MAML), Gupta et al Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices (DREAM), Liu et al RL2: Fast Reinforcement Learning via Slow Reinforcement Learning, Duan et al Learning to reinforcement learn, Wang et al

2023-03-07
Länk till avsnitt

John Schulman

John Schulman is a cofounder of OpenAI, and currently a researcher and engineer at OpenAI.

Featured References

WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman

Training language models to follow instructions with human feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe

Additional References

Our approach to alignment research, OpenAI 2022Training Verifiers to Solve Math Word Problems, Cobbe et al 2021UC Berkeley Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation, John Schulman 2017Proximal Policy Optimization Algorithms, Schulman 2017Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs, Schulman 2016

2022-10-18
Länk till avsnitt

Sven Mika

Sven Mika is the Reinforcement Learning Team Lead at Anyscale, and lead committer of RLlib. He holds a PhD in biomathematics, bioinformatics, and computational biology from Witten/Herdecke University.

Featured References

RLlib Documentation: RLlib: Industry-Grade Reinforcement Learning

Ray: Documentation

RLlib: Abstractions for Distributed Reinforcement Learning
Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica

Episode sponsor: Anyscale

Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.

2022-08-19
Länk till avsnitt

Karol Hausman and Fei Xia

Karol Hausman is a Senior Research Scientist at Google Brain and an Adjunct Professor at Stanford working on robotics and machine learning. Karol is interested in enabling robots to acquire general-purpose skills with minimal supervision in real-world environments.

Fei Xia is a Research Scientist with Google Research. Fei Xia is mostly interested in robot learning in complex and unstructured environments. Previously he has been approaching this problem by learning in realistic and scalable simulation environments (GibsonEnv, iGibson). Most recently, he has been exploring using foundation models for those challenges.

Featured References

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [ website ]
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan

Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter

Additional References

Large-scale simulation for embodied perception and robot learning, Xia 2021QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation, Kalashnikov et al 2018MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale, Kalashnikov et al 2021ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation, Xia et al 2020Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills, Chebotar et al 2021 Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language, Zeng et al 2022

Episode sponsor: Anyscale

Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.

2022-08-16
Länk till avsnitt

Sai Krishna Gottipati

Saikrishna Gottipati is an RL Researcher at AI Redefined, working on RL, MARL, human in the loop learning.

Featured References

Cogment: Open Source Framework For Distributed Multi-actor Training, Deployment & Operations
AI Redefined, Sai Krishna Gottipati, Sagar Kurandwad, Clodéric Mars, Gregory Szriftgiser, François Chabot

Do As You Teach: A Multi-Teacher Approach to Self-Play in Deep Reinforcement Learning
Currently under review

Learning to navigate the synthetically accessible chemical space using reinforcement learning
Sai Krishna Gottipati, Boris Sattarov, Sufeng Niu, Yashaswi Pathak, Haoran Wei, Shengchao Liu, Karam J. Thomas, Simon Blackburn, Connor W. Coley, Jian Tang, Sarath Chandar, Yoshua Bengio

Additional References

Asymmetric self-play for automatic goal discovery in robotic manipulation, 2021 OpenAI et al Continuous Coordination As a Realistic Scenario for Lifelong Learning, 2021 Nekoei et al

Episode sponsor: Anyscale

Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.

2022-08-01
Länk till avsnitt

Aravind Srinivas 2

Aravind Srinivas is back! He is now a research Scientist at OpenAI.

Featured References

Decision Transformer: Reinforcement Learning via Sequence Modeling
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch

VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas

2022-05-09
Länk till avsnitt

Rohin Shah

Dr. Rohin Shah is a Research Scientist at DeepMind, and the editor and main contributor of the Alignment Newsletter.

Featured References

The MineRL BASALT Competition on Learning from Human Feedback
Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan

Preferences Implicit in the State of the World
Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan

Benefits of Assistance over Reward Learning
Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell

On the Utility of Learning about Humans for Human-AI Coordination
Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan

Evaluating the Robustness of Collaborative Agents
Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah

Additional References

AGI Safety Fundamentals, EA Cambridge

2022-04-12
Länk till avsnitt

Jordan Terry

Jordan Terry is a PhD candidate at University of Maryland, the maintainer of Gym, the maintainer and creator of PettingZoo and the founder of Swarm Labs.

Featured References

PettingZoo: Gym for Multi-Agent Reinforcement Learning
J. K. Terry, Benjamin Black, Nathaniel Grammel, Mario Jayakumar, Ananth Hari, Ryan Sullivan, Luis Santos, Rodrigo Perez, Caroline Horsch, Clemens Dieffendahl, Niall L. Williams, Yashas Lokesh, Praveen Ravi

PettingZoo on Github

gym on Github

Additional References

Time Limits in Reinforcement Learning, Pardo et al 2017Deep Reinforcement Learning at the Edge of the Statistical Precipice, Agarwal et al 2021

2022-02-22
Länk till avsnitt

Robert Lange

Robert Tjarko Lange is a PhD student working at the Technical University Berlin.

Featured References

Learning not to learn: Nature versus nurture in silico
Lange, R. T., & Sprekeler, H. (2020)

On Lottery Tickets and Minimal Task Representations in Deep Reinforcement Learning
Vischer, M. A., Lange, R. T., & Sprekeler, H. (2021).

Semantic RL with Action Grammars: Data-Efficient Learning of Hierarchical Task Abstractions
Lange, R. T., & Faisal, A. (2019).

MLE-Infrastructure on Github

Additional References

RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning, Duan et al 2016Learning to reinforcement learn, Wang et al 2016Decision Transformer: Reinforcement Learning via Sequence Modeling, Chen et al 2021

2021-12-20
Länk till avsnitt

NeurIPS 2021 Political Economy of Reinforcement Learning Systems (PERLS) Workshop

We hear about the idea of PERLS and why its important to talk about.

Political Economy of Reinforcement Learning (PERLS) Workshop at NeurIPS 2021 on Tues Dec 14th NeurIPS 2021

2021-11-19
Länk till avsnitt

Amy Zhang

Amy Zhang is a postdoctoral scholar at UC Berkeley and a research scientist at Facebook AI Research. She will be starting as an assistant professor at UT Austin in Spring 2023.

Featured References

Invariant Causal Prediction for Block MDPs
Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, Doina Precup

Multi-Task Reinforcement Learning with Context-based Representations
Shagun Sodhani, Amy Zhang, Joelle Pineau

MBRL-Lib: A Modular Library for Model-based Reinforcement Learning
Luis Pineda, Brandon Amos, Amy Zhang, Nathan O. Lambert, Roberto Calandra

Additional References

Amy Zhang - Exploring Context for Better Generalization in Reinforcement Learning @ UCL DARK ICML 2020 Poster session: Invariant Causal Prediction for Block MDPs Clare Lyle - Invariant Prediction for Generalization in Reinforcement Learning @ Simons Institute

2021-09-27
Länk till avsnitt

Xianyuan Zhan

Xianyuan Zhan is currently a research assistant professor at the Institute for AI Industry Research (AIR), Tsinghua University. He received his Ph.D. degree at Purdue University. Before joining Tsinghua University, Dr. Zhan worked as a researcher at Microsoft Research Asia (MSRA) and a data scientist at JD Technology. At JD Technology, he led the research that uses offline RL to optimize real-world industrial systems.

Featured References

DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning
Xianyuan Zhan, Haoran Xu, Yue Zhang, Yusen Huo, Xiangyu Zhu, Honglei Yin, Yu Zheng

2021-08-30
Länk till avsnitt

Eugene Vinitsky

Eugene Vinitsky is a PhD student at UC Berkeley advised by Alexandre Bayen. He has interned at Tesla and Deepmind.

Featured References

A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings
Eugene Vinitsky, Raphael Köster, John P. Agapiou, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, Joel Z. Leibo

Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL
Eugene Vinitsky, Nathan Lichtle, Kanaad Parvate, Alexandre Bayen

Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion
Eugene Vinitsky; Kanaad Parvate; Aboudy Kreidieh; Cathy Wu; Alexandre Bayen 2018

The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games
Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu

Additional References

SUMO: Simulation of Urban MObility

2021-08-18
Länk till avsnitt

Jess Whittlestone

Dr. Jess Whittlestone is a Senior Research Fellow at the Centre for the Study of Existential Risk and the Leverhulme Centre for the Future of Intelligence, both at the University of Cambridge.

Featured References

The Societal Implications of Deep Reinforcement Learning
Jess Whittlestone, Kai Arulkumaran, Matthew Crosby

Artificial Canaries: Early Warning Signs for Anticipatory and Democratic Governance of AI
Carla Zoe Cremer, Jess Whittlestone

Additional References

CogX: Cutting Edge: Understanding AI systems for a better AI policy, featuring Jack Clark and Jess Whittlestone

2021-07-20
Länk till avsnitt

Aleksandra Faust

Dr Aleksandra Faust is a Staff Research Scientist and Reinforcement Learning research team co-founder at Google Brain Research.

Featured References

Reinforcement Learning and Planning for Preference Balancing Tasks
Faust 2014

Learning Navigation Behaviors End-to-End with AutoRL
Hao-Tien Lewis Chiang, Aleksandra Faust, Marek Fiser, Anthony Francis

Evolving Rewards to Automate Reinforcement Learning
Aleksandra Faust, Anthony Francis, Dar Mehta

Evolving Reinforcement Learning Algorithms

John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust

Adversarial Environment Generation for Learning to Navigate the Web
Izzeddin Gur, Natasha Jaques, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust

Additional References

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch, Esteban Real, Chen Liang, David R. So, Quoc V. Le

2021-07-06
Länk till avsnitt

Sam Ritter

Sam Ritter is a Research Scientist on the neuroscience team at DeepMind.

Featured References

Unsupervised Predictive Memory in a Goal-Directed Agent (MERLIN)
Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap

Meta-RL without forgetting: Been There, Done That: Meta-Learning with Episodic Recall
Samuel Ritter, Jane X. Wang, Zeb Kurth-Nelson, Siddhant M. Jayakumar, Charles Blundell, Razvan Pascanu, Matthew Botvinick

Meta-Reinforcement Learning with Episodic Recall: An Integrative Theory of Reward-Driven Learning
Samuel Ritter 2019

Meta-RL exploration and planning: Rapid Task-Solving in Novel Environments
Sam Ritter, Ryan Faulkner, Laurent Sartran, Adam Santoro, Matt Botvinick, David Raposo

Synthetic Returns for Long-Term Credit Assignment
David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song

Additional References

Sam Ritter: Meta-Learning to Make Smart Inferences from Small Data , North Star AI 2019 The Bitter Lesson, Rich Sutton 2019

2021-06-21
Länk till avsnitt

Thomas Krendl Gilbert

Thomas Krendl Gilbert is a PhD student at UC Berkeley?s Center for Human-Compatible AI, specializing in Machine Ethics and Epistemology.

Featured References

Hard Choices in Artificial Intelligence: Addressing Normative Uncertainty through Sociotechnical Commitments
Roel Dobbe, Thomas Krendl Gilbert, Yonatan Mintz

Mapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous Vehicles
Thomas Krendl Gilbert

AI Development for the Public Interest: From Abstraction Traps to Sociotechnical Risks
McKane Andrus, Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert and Tom Zick

Additional References

Political Economy of Reinforcement Learning Systems (PERLS) The Law and Political Economy (LPE) Project The Societal Implications of Deep Reinforcement Learning, Jess Whittlestone, Kai Arulkumaran, Matthew Crosby Robot Brains Podcast: Yann LeCun explains why Facebook would crumble without AI

2021-05-17
Länk till avsnitt

Marc G. Bellemare

Professor Marc G. Bellemare is a Research Scientist at Google Research (Brain team), An Adjunct Professor at McGill University, and a Canada CIFAR AI Chair.

Featured References

The Arcade Learning Environment: An Evaluation Platform for General Agents
Marc G. Bellemare, Yavar Naddaf, Joel Veness, Michael Bowling

Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg & Demis Hassabis

Autonomous navigation of stratospheric balloons using reinforcement learning
Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda & Ziyu Wang

Additional References

CAIDA Talk: A tour of distributional reinforcement learning November 18, 2020 - Marc G. Bellemare Amii AI Seminar Series: Autonomous nav of stratospheric balloons using RL, Marlos C. Machado UMD RLSS | Marc Bellemare | A History of Reinforcement Learning: Atari to Stratospheric Balloons TalkRL: Marlos C. Machado, Dr. Machado also spoke to us about various aspects of ALE and Project Loon in depth Hyperbolic discounting and learning over multiple horizons, Fedus et al 2019 Marc G. Bellemare on Twitter

2021-05-13
Länk till avsnitt

Robert Osazuwa Ness

Robert Osazuwa Ness is an adjunct professor of computer science at Northeastern University, an ML Research Engineer at Gamalon, and the founder of AltDeep School of AI. He holds a PhD in statistics. He studied at Johns Hopkins SAIS and then Purdue University.

References

Altdeep School of AI, Altdeep on Twitch, Substack, Robert Ness Altdeep Causal Generative Machine Learning Minicourse, Free course Robert Osazuwa Ness on Google Scholar Gamalon Inc Causal Reinforcement Learning talks, Elias Bareinboim The Bitter Lesson, Rich Sutton 2019 The Need for Biases in Learning Generalizations, Tom Mitchell 1980 Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics, Kansky et al 2017

2021-05-08
Länk till avsnitt

Marlos C. Machado

Dr. Marlos C. Machado is a research scientist at DeepMind and an adjunct professor at the University of Alberta. He holds a PhD from the University of Alberta and a MSc and BSc from UFMG, in Brazil.

Featured References

Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents
Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew J. Hausknecht, Michael Bowling

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning [ video ]
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare

Efficient Exploration in Reinforcement Learning through Time-Based Representations
Marlos C. Machado

A Laplacian Framework for Option Discovery in Reinforcement Learning [ video ]
Marlos C. Machado, Marc G. Bellemare, Michael H. Bowling

Eigenoption Discovery through the Deep Successor Representation
Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell

Exploration in Reinforcement Learning with Deep Covering Options
Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Dimitri Konidaris

Autonomous navigation of stratospheric balloons using reinforcement learning
Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda & Ziyu Wang

Generalization and Regularization in DQN
Jesse Farebrother, Marlos C. Machado, Michael Bowling

Additional References

Amii AI Seminar Series: Marlos C. Machado - Autonomous navigation of stratospheric balloons using RL State of the Art Control of Atari Games Using Shallow Reinforcement Learning, Liang et al Introspective Agents: Confidence Measures for General Value Functions, Sherstan et al

2021-04-12
Länk till avsnitt

Nathan Lambert

Nathan Lambert is a PhD Candidate at UC Berkeley.

Featured References

Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning
Nathan O. Lambert, Albert Wilcox, Howard Zhang, Kristofer S. J. Pister, Roberto Calandra

Objective Mismatch in Model-based Reinforcement Learning
Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra

Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning
Nathan O. Lambert, Daniel S. Drew, Joseph Yaconelli, Roberto Calandra, Sergey Levine, Kristofer S.J. Pister

On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning
Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra

Additional References

Nathan Lambert's blog Nathan Lambert on Google scholar

2021-03-22
Länk till avsnitt

Kai Arulkumaran

Kai Arulkumaran is a researcher at Araya in Tokyo.

Featured References

AlphaStar: An Evolutionary Computation Perspective
Kai Arulkumaran, Antoine Cully, Julian Togelius

Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation
Tianhong Dai, Kai Arulkumaran, Tamara Gerbert, Samyakh Tukra, Feryal Behbahani, Anil Anthony Bharath

Training Agents using Upside-Down Reinforcement Learning
Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Ja?kowski, Jürgen Schmidhuber

Additional References

Araya NNAISENSE Kai Arulkumaran on Google Scholar https://github.com/Kaixhin/rlenvs https://github.com/Kaixhin/Atari https://github.com/Kaixhin/Rainbow Tschiatschek, S., Arulkumaran, K., Stühmer, J. & Hofmann, K. (2018). Variational Inference for Data-Efficient Model Learning in POMDPs. arXiv:1805.09281. Arulkumaran, K., Dilokthanakul, N., Shanahan, M. & Bharath, A. A. (2016). Classifying Options for Deep Reinforcement Learning. International Joint Conference on Artificial Intelligence, Deep Reinforcement Learning Workshop. Garnelo, M., Arulkumaran, K. & Shanahan, M. (2016). Towards Deep Symbolic Reinforcement Learning. Annual Conference on Neural Information Processing Systems, Deep Reinforcement Learning Workshop. Arulkumaran, K., Deisenroth, M. P., Brundage, M. & Bharath, A. A. (2017). Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine. Agostinelli, A., Arulkumaran, K., Sarrico, M., Richemond, P. & Bharath, A. A. (2019). Memory-Efficient Episodic Control Reinforcement Learning with Dynamic Online k-means. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning. Sarrico, M., Arulkumaran, K., Agostinelli, A., Richemond, P. & Bharath, A. A. (2019). Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic Control. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning.

2021-03-16
Länk till avsnitt

Michael Dennis

Michael Dennis is a PhD student at the Center for Human-Compatible AI at UC Berkeley, supervised by Professor Stuart Russell.

I'm interested in robustness in RL and multi-agent RL, specifically as it applies to making the interaction between AI systems and society at large to be more beneficial.

--Michael Dennis

Featured References

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design [PAIRED]
Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine
Videos

Adversarial Policies: Attacking Deep Reinforcement Learning

Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell
Homepage and Videos

Accumulating Risk Capital Through Investing in Cooperation
Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell

Quantifying Differences in Reward Functions [EPIC]
Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike

Additional References

Safe Opponent Exploitation, Sam Ganzfried And Tuomas Sandholm 2015 Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning, Natasha Jaques et al 2019 Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research, Leibo et al 2019 Leveraging Procedural Generation to Benchmark Reinforcement Learning, Karl Cobbe et al 2019 Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions, Wang et al 2019 Consequences of Misaligned AI, Zhuang et al 2020 Conservative Agency via Attainable Utility Preservation, Turner et al 2019

2021-01-26
Länk till avsnitt

Roman Ring

Roman Ring is a Research Engineer at DeepMind.

Featured References

Grandmaster level in StarCraft II using multi-agent reinforcement learning
Vinyals et al, 2019

Replicating DeepMind StarCraft II Reinforcement Learning Benchmark with Actor-Critic Methods
Roman Ring, 2018

Additional References

Relational Deep Reinforcement Learning, Zambaldi et al 2018 StarCraft II: A New Challenge for Reinforcement Learning, Vinyals et al 2017 Safe and Efficient Off-Policy Reinforcement Learning [Retrace(?)], Munos et al 2016 Sample Efficient Actor-Critic with Experience Replay [ACER], Wang et al 2016 IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures [IMPALA/V-trace], Espeholt et al 2018

2021-01-11
Länk till avsnitt

Shimon Whiteson

Shimon Whiteson is a Professor of Computer Science at Oxford University, the head of WhiRL, the Whiteson Research Lab at Oxford, and Head of Research at Waymo UK.

Featured References

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning
Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson

Additional References

Shimon Whiteson - Multi-agent RL, MIT Embodied Intelligence Seminar The StarCraft Multi-Agent Challenge, Samvelyan et al 2019 Direct Policy Transfer with Hidden Parameter Markov Decision Processes, Yao et al 2018 Value-Decomposition Networks For Cooperative Multi-Agent Learning, Sunehag et al 2017 Whiteson Research Lab Waymo acquires Latent Logic to accelerate progress towards safe, driverless vehicles, Oxford News Waymo

2020-12-06
Länk till avsnitt

Aravind Srinivas

Aravind Srinivas is a 3rd year PhD student at UC Berkeley advised by Prof. Abbeel.
He co-created and co-taught a grad course on Deep Unsupervised Learning at Berkeley.

Featured References

Data-Efficient Image Recognition with Contrastive Predictive Coding
Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord

Contrastive Unsupervised Representations for Reinforcement Learning
Aravind Srinivas, Michael Laskin, Pieter Abbeel

Reinforcement Learning with Augmented Data
Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, Aravind Srinivas

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning
Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel

Additional References

CS294-158-SP20 Deep Unsupervised Learning, Berkeley Phasic Policy Gradient, Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman Bootstrap your own latent: A new approach to self-supervised Learning , Grill et al 2020

2020-09-21
Länk till avsnitt

Taylor Killian

Taylor Killian is a Ph.D. student at the University of Toronto and the Vector Institute, and an Intern at Google Brain.

Featured References

Direct Policy Transfer with Hidden Parameter Markov Decision Processes
Yao, Killian, Konidaris, Doshi-Velez

Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes
Killian, Daulton, Konidaris, Doshi-Velez

Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes
Killian, Konidaris, Doshi-Velez

Counterfactually Guided Policy Transfer in Clinical Settings
Killian, Ghassemi, Joshi

Additional References

Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations, Doshi-Velez, Konidaris Mimic III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care, Komorowski et al

2020-08-17
Länk till avsnitt

Nan Jiang

Nan Jiang is an Assistant Professor of Computer Science at University of Illinois. He was a Postdoc Microsoft Research, and did his PhD at University of Michigan under Professor Satinder Singh.

Featured References

Reinforcement Learning: Theory and Algorithms
Alekh Agarwal Nan Jiang Sham M. Kakade

Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches
Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford

Information-Theoretic Considerations in Batch Reinforcement Learning
Jinglin Chen, Nan Jiang

Additional References

Towards a Unified Theory of State Abstraction for MDPs, Lihong Li, Thomas J. Walsh, Michael L. Littman Doubly Robust Off-policy Value Evaluation for Reinforcement Learning, Nan Jiang, Lihong Li Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization, Nan Jiang, Jiawei Huang Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning, Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue

Errata

[Robin] I misspoke when I said in domain randomization we want the agent to "ignore" domain parameters. What I should have said is, we want the agent to perform well within some range of domain parameters, it should be robust with respect to domain parameters.

2020-07-06
Länk till avsnitt

Danijar Hafner

Danijar Hafner is a PhD student at the University of Toronto, and a student researcher at Google Research, Brain Team and the Vector Institute. He holds a Masters of Research from University College London.

Featured References

A deep learning framework for neuroscience
Blake A. Richards, Timothy P. Lillicrap , Philippe Beaudoin, Yoshua Bengio, Rafal Bogacz, Amelia Christensen, Claudia Clopath, Rui Ponte Costa, Archy de Berker, Surya Ganguli, Colleen J. Gillon , Danijar Hafner, Adam Kepecs, Nikolaus Kriegeskorte, Peter Latham , Grace W. Lindsay, Kenneth D. Miller , Richard Naud , Christopher C. Pack, Panayiota Poirazi , Pieter Roelfsema , João Sacramento, Andrew Saxe, Benjamin Scellier, Anna C. Schapiro , Walter Senn, Greg Wayne, Daniel Yamins, Friedemann Zenke, Joel Zylberberg, Denis Therien, Konrad P. Kording Learning Latent Dynamics for Planning from Pixels
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson Dream to Control: Learning Behaviors by Latent Imagination
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Planning to Explore via Self-Supervised World Models
Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak

Additional References

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model Schrittwieser et al Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Silver et al Shaping Belief States with Generative Environment Models for RL Gregor et al Model-Based Active Exploration Shyam et al

Errata

[Robin] Around 1:37 I say "some ... world models get confused by random noise". I meant "some curiosity formulations", not "world models"

2020-05-14
Länk till avsnitt

Csaba Szepesvari

Csaba Szepesvari is:

Head of the Foundations Team at DeepMind Professor of Computer Science at the University of Alberta Canada CIFAR AI Chair Fellow at the Alberta Machine Intelligence Institute Co-Author of the book Bandit Algorithms along with Tor Lattimore, and author of the book Algorithms for Reinforcement Learning

References

Bandit based monte-carlo planning, Levente Kocsis, Csaba Szepesvári Bandit Algorithms, Tor Lattimore, Csaba Szepesvári Algorithms for Reinforcement Learning, Csaba Szepesvári The Predictron: End-To-End Learning and Planning, David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris A Bayesian framework for reinforcement learning, Strens Solving Rubik?s Cube with a Robot Hand ; Paper, OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang The Nonstochastic Multiarmed Bandit Problem, Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire Deep Learning with Bayesian Principles, Mohammad Emtiyaz Khan Tackling climate change with Machine Learning David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio

2020-04-05
Länk till avsnitt

Ben Eysenbach

Ben Eysenbach is a PhD student in the Machine Learning Department at Carnegie Mellon University. He was a Resident at Google Brain, and studied math and computer science at MIT. He co-founded the ICML Exploration in Reinforcement Learning workshop.

Featured References

Diversity is All You Need: Learning Skills without a Reward Function
Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Additional References

Behaviour Suite for Reinforcement Learning, Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt Learning Latent Plans from Play, Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, Pierre Sermanet Finale Doshi-Velez Emma Brunskill Closed-loop optimization of fast-charging protocols for batteries with machine learning, Peter Attia, Aditya Grover, Norman Jin, Kristen Severson, Todor Markov, Yang-Hung Liao, Michael Chen, Bryan Cheong, Nicholas Perkins, Zi Yang, Patrick Herring, Muratahan Aykol, Stephen Harris, Richard Braatz, Stefano Ermon, William Chueh CMU 10-703 Deep Reinforcement Learning, Fall 2019, Carnegie Mellon University ICML Exploration in Reinforcement Learning workshop

2020-03-30
Länk till avsnitt

NeurIPS 2019 Deep RL Workshop

Thank you to all the presenters that participated. I covered as many as I could given the time and crowds, if you were not included and wish to be, please email [email protected]

More details on the official NeurIPS Deep RL Workshop site.

0:23 Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms; Matthia Sabatelli (University of Liege); Gilles Louppe (University of Liège); Pierre Geurts (University of Liège); Marco Wiering (University of Groningen) [external pdf link] 4:16 Single Deep Counterfactual Regret Minimization; Eric Steinberger (University of Cambridge). 5:38 On the Convergence of Episodic Reinforcement Learning Algorithms at the Example of RUDDER; Markus Holzleitner (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria); José Arjona-Medina (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria); Marius-Constantin Dinu (LIT AI Lab / University Linz ); Sepp Hochreiter (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria). 9:33 Objective Mismatch in Model-based Reinforcement Learning; Nathan Lambert (UC Berkeley); Brandon Amos (Facebook); Omry Yadan (Facebook); Roberto Calandra (Facebook). 10:51 Option Discovery using Deep Skill Chaining; Akhil Bagaria (Brown University); George Konidaris (Brown University). 13:44 Blue River Controls: A toolkit for Reinforcement Learning Control Systems on Hardware; Kirill Polzounov (University of Calgary); Ramitha Sundar (Blue River Technology); Lee Reden (Blue River Technology). 14:52 LeDeepChef: Deep Reinforcement Learning Agent for Families of Text-Based Games; Leonard Adolphs (ETHZ); Thomas Hofmann (ETH Zurich). 16:30 Accelerating Training in Pommerman with Imitation and Reinforcement Learning; Hardik Meisheri (TCS Research); Omkar Shelke (TCS Research); Richa Verma (TCS Research); Harshad Khadilkar (TCS Research). 17:27 Dream to Control: Learning Behaviors by Latent Imagination; Danijar Hafner (Google); Timothy Lillicrap (DeepMind); Jimmy Ba (University of Toronto); Mohammad Norouzi (Google Brain) [external pdf link]. 20:48 Adaptive Temperature Tuning for Mellowmax in Deep Reinforcement Learning; Seungchan Kim (Brown University); George Konidaris (Brown). 22:05 Meta-learning curiosity algorithms; Ferran Alet (MIT); Martin Schneider (MIT); Tomas Lozano-Perez (MIT); Leslie Kaelbling (MIT). 24:09 Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards; Xingyu Lu (Berkeley); Stas Tiomkin (BAIR, UC Berkeley); Pieter Abbeel (UC Berkeley). 25:44 Swarm-inspired Reinforcement Learning via Collaborative Inter-agent Knowledge Distillation; Zhang-Wei Hong (Preferred Networks); Prabhat Nagarajan (Preferred Networks); Guilherme Maeda (Preferred Networks). 26:35 Multiplayer AlphaZero; Nicholas Petosa (Georgia Institute of Technology); Tucker Balch (Ga Tech) [external pdf link]. 27:43 Prioritized Sequence Experience Replay; Marc Brittain (Iowa State University); Joshua Bertram (Iowa State University); Xuxi Yang (Iowa State University); Peng Wei (Iowa State University) [external pdf link]. 29:14 Recurrent neural-linear posterior sampling for non-stationary bandits; Paulo Rauber (IDSIA); Aditya Ramesh (USI); Jürgen Schmidhuber (IDSIA - Lugano). 29:36 Improving Evolutionary Strategies With Past Descent Directions; Asier Mujika (ETH Zurich); Florian Meier (ETH Zurich); Marcelo Matheus Gauy (ETH Zurich); Angelika Steger (ETH Zurich) [external pdf link]. 31:40 ZPD Teaching Strategies for Deep Reinforcement Learning from Demonstrations; Daniel Seita (University of California, Berkeley); David Chan (University of California, Berkeley); Roshan Rao (UC Berkeley); Chen Tang (UC Berkeley); Mandi Zhao (UC Berkeley); John Canny (UC Berkeley) [external pdf link]. 33:05 Bottom-Up Meta-Policy Search; Luckeciano Melo (Aeronautics Institute of Technology); Marcos Máximo (Aeronautics Institute of Technology); Adilson Cunha (Aeronautics Institute of Technology) [external pdf link]. 33:37 MERL: Multi-Head Reinforcement Learning; Yannis Flet-Berliac (University of Lille / Inria); Philippe Preux (INRIA) [external pdf link]. 35:30 Emergen...

2019-12-20
Länk till avsnitt

Scott Fujimoto

Scott Fujimoto is a PhD student at McGill University and Mila. He is the author of TD3 as well as some of the recent developments in batch deep reinforcement learning.

Featured References

Addressing Function Approximation Error in Actor-Critic Methods
Scott Fujimoto, Herke van Hoof, David Meger

Off-Policy Deep Reinforcement Learning without Exploration

Scott Fujimoto, David Meger, Doina Precup

Benchmarking Batch Deep Reinforcement Learning Algorithms

Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, Joelle Pineau

Additional References

Striving for Simplicity in Off-Policy Deep Reinforcement Learning
Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard Continuous control with deep reinforcement learning
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra Distributed Distributional Deterministic Policy Gradients
Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, Timothy Lillicrap

2019-11-19
Länk till avsnitt

Jessica Hamrick

Dr. Jessica Hamrick is a Research Scientist at DeepMind. She holds a PhD in Psychology from UC Berkeley.

Featured References

Structured agents for physical construction
Victor Bapst, Alvaro Sanchez-Gonzalez, Carl Doersch, Kimberly L. Stachenfeld, Pushmeet Kohli, Peter W. Battaglia, Jessica B. Hamrick

Analogues of mental simulation and imagination in deep learning

Jessica Hamrick

Additional References

Metacontrol for Adaptive Imagination-Based Optimization
Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia Surprising Negative Results for Generative Adversarial Tree Search
Kamyar Azizzadenesheli, Brandon Yang, Weitang Liu, Zachary C Lipton, Animashree Anandkumar Metareasoning and Mental Simulation
Jessica B. Hamrick Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis Object-oriented state editing for HRL
Victor Bapst, Alvaro Sanchez-Gonzalez, Omar Shams, Kimberly Stachenfeld, Peter W. Battaglia, Satinder Singh, Jessica B. Hamrick FeUdal Networks for Hierarchical Reinforcement Learning
Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu PILCO: A Model-Based and Data-Efficient Approach to Policy Search
Marc Peter Deisenroth, Carl Edward Rasmussen Blueberry Earth
Anders Sandberg

2019-11-12
Länk till avsnitt

Pablo Samuel Castro

Dr Pablo Samuel Castro is a Staff Research Software Engineer at Google Brain. He is the main author of the Dopamine RL framework.

Featured References

A Comparative Analysis of Expected and Distributional Reinforcement Learning

Clare Lyle, Pablo Samuel Castro, Marc G. Bellemare

A Geometric Perspective on Optimal Representations for Reinforcement Learning

Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle

Dopamine: A Research Framework for Deep Reinforcement Learning
Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, Marc G. Bellemare

Dopamine RL framework on github

Tensorflow Agents on github

Additional References

Using Linear Programming for Bayesian Exploration in Markov Decision Processes
Pablo Samuel Castro, Doina Precup Using bisimulation for policy transfer in MDPs
Pablo Samuel Castro, Doina Precup Rainbow: Combining Improvements in Deep Reinforcement Learning
Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver Implicit Quantile Networks for Distributional Reinforcement Learning
Will Dabney, Georg Ostrovski, David Silver, Rémi Munos A Distributional Perspective on Reinforcement Learning
Marc G. Bellemare, Will Dabney, Rémi Munos

2019-10-10
Länk till avsnitt

Kamyar Azizzadenesheli

Dr. Kamyar Azizzadenesheli is a post-doctorate scholar at Caltech. His research interest is mainly in the area of Machine Learning, from theory to practice, with the main focus in Reinforcement Learning. He will be joining Purdue University as an Assistant CS Professor in Fall 2020.

Featured References

Efficient Exploration through Bayesian Deep Q-Networks
Kamyar Azizzadenesheli, Animashree Anandkumar

Surprising Negative Results for Generative Adversarial Tree Search
Kamyar Azizzadenesheli, Brandon Yang, Weitang Liu, Zachary C Lipton, Animashree Anandkumar

Maybe a few considerations in Reinforcement Learning Research?
Kamyar Azizzadenesheli

Additional References

Model-Based Reinforcement Learning for Atari
Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski Near-optimal Regret Bounds for Reinforcement Learning
Thomas Jaksch, Ronald Ortner, Peter Auer Curious Model-Building Control Systems
Jürgen Schmidhuber Rainbow: Combining Improvements in Deep Reinforcement Learning
Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics
Ken Kansky, Tom Silver, David A. Mély, Mohamed Eldawy, Miguel Lázaro-Gredilla, Xinghua Lou, Nimrod Dorfman, Szymon Sidor, Scott Phoenix, Dileep George Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis

2019-09-21
Länk till avsnitt

Antonin Raffin and Ashley Hill

Antonin Raffin is a researcher at the German Aerospace Center (DLR) in Munich, working in the Institute of Robotics and Mechatronics. His research is on using machine learning for controlling real robots (because simulation is not enough), with a particular interest for reinforcement learning.

Ashley Hill is doing his thesis on improving control algorithms using machine learning for real time gain tuning.

He works mainly with neuroevolution, genetic algorithms, and of course reinforcement learning, applied to mobile robots. He holds a masters degree in Machine learning, and a bachelors in Computer science from the Université Paris-Saclay.

Featured References

stable-baselines on github
Ashley Hill, Antonin Raffin primary authors.

S-RL Toolbox
Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat

Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics
Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat

Additional References

Learning to Drive Smoothly in Minutes, Antonin Raffin Multimodal SRL (best paper at ICRA): Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks, Michelle A. Lee, Yuke Zhu, Krishnan Srinivasan, Parth Shah, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg Benchmarking Model-Based Reinforcement Learning, Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, Jimmy Ba TossingBot: Learning to Throw Arbitrary Objects with Residual Physics
Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser Stable Baselines roadmap OpenAI baselines stable-baselines github pull request

2019-09-05
Länk till avsnitt

Michael Littman

Michael L Littman is a professor of Computer Science at Brown University. He was elected ACM Fellow in 2018 "For contributions to the design and analysis of sequential decision making algorithms in artificial intelligence".

Featured References

Convergent Actor Critic by Humans
James MacGlashan, Michael L. Littman, David L. Roberts, Robert Tyler Loftin, Bei Peng, Matthew E. Taylor

People teach with rewards and punishments as communication, not reinforcements
Mark Ho, Fiery Cushman, Michael L. Littman, Joseph Austerweil

Theory of Minds: Understanding Behavior in Groups Through Inverse Planning
Michael Shum, Max Kleiman-Weiner, Michael L. Littman, Joshua B. Tenenbaum

Personalized education at scale
Saarinen, Cater, Littman

Additional References

Michael Littman papers on Google Scholar, Semantic Scholar Reinforcement Learning on Udacity, Charles Isbell, Michael Littman, Chris Pryby Machine Learning on Udacity, Michael Littman, Charles Isbell, Pushkar Kolhe Temporal Difference Learning and TD-Gammon, Gerald Tesauro Playing Atari with Deep Reinforcement Learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller Ask Me Anything about MOOCs, D Fisher, C Isbell, ML Littman, M Wollowski, et al Reinforcement Learning and Decision Making (RLDM) Conference Algorithms for Sequential Decision Making, Michael Littman's Thesis Machine Learning A Cappella - Overfitting Thriller!, Michael Littman and Charles Isbell feat Infinite Harmony Turbotax Ad 2016: Genius Anna/Michael Littman

2019-08-24
Länk till avsnitt

Natasha Jaques

Natasha Jaques is a PhD candidate at MIT working on affective and social intelligence. She has interned with DeepMind and Google Brain, and was an OpenAI Scholars mentor. Her paper ?Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning? received an honourable mention for best paper at ICML 2019.

Featured References

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

Tackling climate change with Machine Learning
David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio

Additional References

MIT Media Lab Flight Offsets, Caroline Jaffe, Juliana Cherston, Natasha Jaques Modeling Others using Oneself in Multi-Agent Reinforcement Learning,
Roberta Raileanu, Emily Denton, Arthur Szlam, Rob Fergus Inequity aversion improves cooperation in intertemporal social dilemmas,
Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, Thore Graepel Sequential Social Dilemma Games on github, Eugene Vinitsky, Natasha Jaques AI Alignment newsletter, Rohin Shah Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions, Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley The social function of intellect, Nicholas Humphrey Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research, Joel Z. Leibo, Edward Hughes, Marc Lanctot, Thore Graepel A Recipe for Training Neural Networks, Andrej Karpathy Emotionally Adaptive Intelligent Tutoring Systems using POMDPs, Natasha Jaques Sapiens, Yuval Noah Harari

2019-08-10
Länk till avsnitt

Hur lyssnar man på podcast?

En liten tjänst av I'm With Friends. Finns även på engelska.
Uppdateras med hjälp från iTunes.