Sveriges 100 mest populära podcasts

TalkRL: The Reinforcement Learning Podcast

TalkRL: The Reinforcement Learning Podcast

TalkRL podcast is All Reinforcement Learning, All the Time. In-depth interviews with brilliant people at the forefront of RL research and practice. Guests from places like MILA, OpenAI, MIT, DeepMind, Berkeley, Amii, Oxford, Google Research, Brown, Waymo, Caltech, and Vector Institute. Hosted by Robin Ranjit Singh Chauhan.

Prenumerera

iTunes / Overcast / RSS

Webbplats

talkrl.com

Avsnitt

Glen Berseth on RL Conference

Glen Berseth is an assistant professor at the Université de Montréal, a core academic member of the Mila - Quebec AI Institute, a Canada CIFAR AI chair, member l'Institute Courtios, and co-director of the Robotics and Embodied AI Lab (REAL). 

Featured Links 

Reinforcement Learning Conference 

Closing the Gap between TD Learning and Supervised Learning--A Generalisation Point of View
Raj Ghugare, Matthieu Geist, Glen Berseth, Benjamin Eysenbach

2024-03-11
Länk till avsnitt

Ian Osband

Ian Osband is a Research scientist at OpenAI (ex DeepMind, Stanford) working on decision making under uncertainty.  

We spoke about: 

- Information theory and RL 

- Exploration, epistemic uncertainty and joint predictions 

- Epistemic Neural Networks and scaling to LLMs 


Featured References 

Reinforcement Learning, Bit by Bit 
Xiuyuan Lu, Benjamin Van Roy, Vikranth Dwaracherla, Morteza Ibrahimi, Ian Osband, Zheng Wen 

From Predictions to Decisions: The Importance of Joint Predictive Distributions 

Zheng Wen, Ian Osband, Chao Qin, Xiuyuan Lu, Morteza Ibrahimi, Vikranth Dwaracherla, Mohammad Asghari, Benjamin Van Roy  

 

Epistemic Neural Networks 

Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy  


Approximate Thompson Sampling via Epistemic Neural Networks 

Ian Osband, Zheng Wen, Seyed Mohammad Asghari, Vikranth Dwaracherla, Morteza Ibrahimi, Xiuyuan Lu, Benjamin Van Roy 

  


Additional References  

Thesis defence, Ian Osband Homepage, Ian Osband Epistemic Neural Networks at Stanford RL Forum Behaviour Suite for Reinforcement Learning, Osband et al 2019 Efficient Exploration for LLMs, Dwaracherla et al 2024 
2024-03-07
Länk till avsnitt

Sharath Chandra Raparthy

Sharath Chandra Raparthy on In-Context Learning for Sequential Decision Tasks, GFlowNets, and more!  

Sharath Chandra Raparthy is an AI Resident at FAIR at Meta, and did his Master's at Mila.  


Featured Reference 

Generalization to New Sequential Decision Making Tasks with In-Context Learning   
Sharath Chandra Raparthy , Eric Hambro, Robert Kirk , Mikael Henaff, , Roberta Raileanu 

Additional References  

Sharath Chandra Raparthy Homepage  Human-Timescale Adaptation in an Open-Ended Task Space, Adaptive Agent Team 2023Data Distributional Properties Drive Emergent In-Context Learning in Transformers, Chan et al 2022  Decision Transformer: Reinforcement Learning via Sequence Modeling, Chen et al  2021
2024-02-12
Länk till avsnitt

Pierluca D'Oro and Martin Klissarov

Pierluca D'Oro and Martin Klissarov on Motif and RLAIF, Noisy Neighborhoods and Return Landscapes, and more!  

Pierluca D'Oro is PhD student at Mila and visiting researcher at Meta.


Martin Klissarov is a PhD student at Mila and McGill and research scientist intern at Meta.  


Featured References 

Motif: Intrinsic Motivation from Artificial Intelligence Feedback 
Martin Klissarov*, Pierluca D'Oro*, Shagun Sodhani, Roberta Raileanu, Pierre-Luc Bacon, Pascal Vincent, Amy Zhang, Mikael Henaff 

Policy Optimization in a Noisy Neighborhood: On Return Landscapes in Continuous Control 
Nate Rahn*, Pierluca D'Oro*, Harley Wiltzer, Pierre-Luc Bacon, Marc G. Bellemare 

To keep doing RL research, stop calling yourself an RL researcher
Pierluca D'Oro 

2023-11-13
Länk till avsnitt

Martin Riedmiller

Martin Riedmiller of Google DeepMind on controlling nuclear fusion plasma in a tokamak with RL, the original Deep Q-Network, Neural Fitted Q-Iteration, Collect and Infer, AGI for control systems, and tons more!  


Martin Riedmiller is a research scientist and team lead at DeepMind.   


Featured References   


Magnetic control of tokamak plasmas through deep reinforcement learning 
Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de las Casas, Craig Donner, Leslie Fritz, Cristian Galperti, Andrea Huber, James Keeling, Maria Tsimpoukelli, Jackie Kay, Antoine Merle, Jean-Marc Moret, Seb Noury, Federico Pesamosca, David Pfau, Olivier Sauter, Cristian Sommariva, Stefano Coda, Basil Duval, Ambrogio Fasoli, Pushmeet Kohli, Koray Kavukcuoglu, Demis Hassabis & Martin Riedmiller


Human-level control through deep reinforcement learning
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, Demis Hassabis 

Neural fitted Q iteration?first experiences with a data efficient neural reinforcement learning method 
Martin Riedmiller  

2023-08-22
Länk till avsnitt

Max Schwarzer

Max Schwarzer is a PhD student at Mila, with Aaron Courville and Marc Bellemare, interested in RL scaling, representation learning for RL, and RL for science.  Max spent the last 1.5 years at Google Brain/DeepMind, and is now at Apple Machine Learning Research.   

Featured References

Bigger, Better, Faster: Human-level Atari with human-level efficiency 
Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro 

Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier
Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville 

The Primacy Bias in Deep Reinforcement Learning
Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville 


Additional References   

Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al 2017  When to use parametric models in reinforcement learning? Hasselt et al 2019 Data-Efficient Reinforcement Learning with Self-Predictive Representations, Schwarzer et al 2020  Pretraining Representations for Data-Efficient Reinforcement Learning, Schwarzer et al 2021  



2023-08-08
Länk till avsnitt

Julian Togelius

Julian Togelius is an Associate Professor of Computer Science and Engineering at NYU, and Cofounder and research director at modl.ai


  

Featured References  
Choose Your Weapon: Survival Strategies for Depressed AI Academics

Julian Togelius, Georgios N. Yannakakis

Learning Controllable 3D Level Generators

Zehua Jiang, Sam Earle, Michael Cerny Green, Julian Togelius

PCGRL: Procedural Content Generation via Reinforcement Learning

Ahmed Khalifa, Philip Bontrager, Sam Earle, Julian Togelius

Illuminating Generalization in Deep Reinforcement Learning through Procedural Level Generation

Niels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, Sebastian Risi

2023-07-25
Länk till avsnitt

Jakob Foerster

Jakob Foerster on Multi-Agent learning, Cooperation vs Competition, Emergent Communication, Zero-shot coordination, Opponent Shaping, agents for Hanabi and Prisoner's Dilemma, and more.  

Jakob Foerster is an Associate Professor at University of Oxford.  

Featured References  

Learning with Opponent-Learning Awareness 
Jakob N. Foerster, Richard Y. Chen, Maruan Al-Shedivat, Shimon Whiteson, Pieter Abbeel, Igor Mordatch  

Model-Free Opponent Shaping 
Chris Lu, Timon Willi, Christian Schroeder de Witt, Jakob Foerster  

Off-Belief Learning 
Hengyuan Hu, Adam Lerer, Brandon Cui, David Wu, Luis Pineda, Noam Brown, Jakob Foerster  

Learning to Communicate with Deep Multi-Agent Reinforcement Learning 
Jakob N. Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson  

Adversarial Cheap Talk 
Chris Lu, Timon Willi, Alistair Letcher, Jakob Foerster  

Cheap Talk Discovery and Utilization in Multi-Agent Reinforcement Learning 
Yat Long Lo, Christian Schroeder de Witt, Samuel Sokota, Jakob Nicolaus Foerster, Shimon Whiteson  


Additional References  

Lectures by Jakob on youtube 
2023-05-08
Länk till avsnitt

Danijar Hafner 2

Danijar Hafner on the DreamerV3 agent and world models, the Director agent and heirarchical RL,  realtime RL on robots with DayDreamer, and his framework for unsupervised agent design!

Danijar Hafner is a PhD candidate at the University of Toronto with Jimmy Ba, a visiting student at UC Berkeley with Pieter Abbeel, and an intern at DeepMind.  He has been our guest before back on episode 11.  


Featured References   

Mastering Diverse Domains through World Models [ blog ] DreaverV3 

Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap  


DayDreamer: World Models for Physical Robot Learning [ blog
Philipp Wu, Alejandro Escontrela, Danijar Hafner, Ken Goldberg, Pieter Abbeel 

Deep Hierarchical Planning from Pixels [ blog
Danijar Hafner, Kuang-Huei Lee, Ian Fischer, Pieter Abbeel   

Action and Perception as Divergence Minimization [ blog
Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess 


Additional References  

Mastering Atari with Discrete World Models [ blog ] DreaverV2 ; Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba  Dream to Control: Learning Behaviors by Latent Imagination [ blog ] Dreamer ; Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi  Planning to Explore via Self-Supervised World Models ; Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak  
2023-04-12
Länk till avsnitt

Jeff Clune

AI Generating Algos, Learning to play Minecraft with Video PreTraining (VPT), Go-Explore for hard exploration, POET and Open Endedness, AI-GAs and ChatGPT, AGI predictions, and lots more!  

Professor Jeff Clune is Associate Professor of Computer Science at University of British Columbia, a Canada CIFAR AI Chair and Faculty Member at Vector Institute, and Senior Research Advisor at DeepMind.  


Featured References 

Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos [ Blog Post ]
Bowen Baker, Ilge Akkaya, Peter Zhokhov, Joost Huizinga, Jie Tang, Adrien Ecoffet, Brandon Houghton, Raul Sampedro, Jeff Clune 

Robots that can adapt like animals
Antoine Cully, Jeff Clune, Danesh Tarapore, Jean-Baptiste Mouret 

Illuminating search spaces by mapping elites
Jean-Baptiste Mouret, Jeff Clune 

Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions
Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune, Kenneth O. Stanley 

Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions
Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley 

First return, then explore
Adrien Ecoffet, Joost Huizinga, Joel Lehman, Kenneth O. Stanley, Jeff Clune

2023-03-27
Länk till avsnitt

Natasha Jaques 2

Hear about why OpenAI cites her work in RLHF and dialog models, approaches to rewards in RLHF, ChatGPT, Industry vs Academia, PsiPhi-Learning, AGI and more! 

Dr Natasha Jaques is a Senior Research Scientist at Google Brain.

Featured References

Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog
Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard 

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control
Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck 

PsiPhi-Learning: Reinforcement Learning with Demonstrations using Successor Features and Inverse Temporal Difference Learning
Angelos Filos, Clare Lyle, Yarin Gal, Sergey Levine, Natasha Jaques, Gregory Farquhar 

Basis for Intentions: Efficient Inverse Reinforcement Learning using Past Experience
Marwa Abdulhai, Natasha Jaques, Sergey Levine 


Additional References  

Fine-Tuning Language Models from Human Preferences, Daniel M. Ziegler et al 2019  Learning to summarize from human feedback, Nisan Stiennon et al 2020  Training language models to follow instructions with human feedback, Long Ouyang et al 2022  
2023-03-14
Länk till avsnitt

Jacob Beck and Risto Vuorio

Jacob Beck and Risto Vuorio on their recent Survey of Meta-Reinforcement Learning.  Jacob and Risto are Ph.D. students at Whiteson Research Lab at University of Oxford.   


Featured Reference   


A Survey of Meta-Reinforcement Learning
Jacob Beck, Risto Vuorio, Evan Zheran Liu, Zheng Xiong, Luisa Zintgraf, Chelsea Finn, Shimon Whiteson   


Additional References  

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning, Luisa Zintgraf et al  Mastering Diverse Domains through World Models (Dreamerv3), Hafner et al    Unsupervised Meta-Learning for Reinforcement Learning (MAML), Gupta et al  Decoupling Exploration and Exploitation for Meta-Reinforcement Learning without Sacrifices (DREAM), Liu et al  RL2: Fast Reinforcement Learning via Slow Reinforcement Learning, Duan et al  Learning to reinforcement learn, Wang et al  
2023-03-07
Länk till avsnitt

John Schulman

John Schulman is a cofounder of OpenAI, and currently a researcher and engineer at OpenAI.


Featured References

WebGPT: Browser-assisted question-answering with human feedback
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John Schulman

Training language models to follow instructions with human feedback
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan Lowe

Additional References

Our approach to alignment research, OpenAI 2022Training Verifiers to Solve Math Word Problems, Cobbe et al 2021UC Berkeley Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation, John Schulman 2017Proximal Policy Optimization Algorithms, Schulman 2017Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs, Schulman 2016
2022-10-18
Länk till avsnitt

Sven Mika

Sven Mika is the Reinforcement Learning Team Lead at Anyscale, and lead committer of RLlib. He holds a PhD in biomathematics, bioinformatics, and computational biology from Witten/Herdecke University. 


Featured References

RLlib Documentation: RLlib: Industry-Grade Reinforcement Learning

Ray: Documentation

RLlib: Abstractions for Distributed Reinforcement Learning
Eric Liang, Richard Liaw, Philipp Moritz, Robert Nishihara, Roy Fox, Ken Goldberg, Joseph E. Gonzalez, Michael I. Jordan, Ion Stoica


Episode sponsor: Anyscale

Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.

Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.

2022-08-19
Länk till avsnitt

Karol Hausman and Fei Xia

Karol Hausman is a Senior Research Scientist at Google Brain and an Adjunct Professor at Stanford working on robotics and machine learning. Karol is interested in enabling robots to acquire general-purpose skills with minimal supervision in real-world environments.

Fei Xia is a Research Scientist with Google Research. Fei Xia is mostly interested in robot learning in complex and unstructured environments. Previously he has been approaching this problem by learning in realistic and scalable simulation environments (GibsonEnv, iGibson). Most recently, he has been exploring using foundation models for those challenges.

Featured References

Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [ website ]
Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, Daniel Ho, Jasmine Hsu, Julian Ibarz, Brian Ichter, Alex Irpan, Eric Jang, Rosario Jauregui Ruano, Kyle Jeffrey, Sally Jesmonth, Nikhil J Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Kuang-Huei Lee, Sergey Levine, Yao Lu, Linda Luu, Carolina Parada, Peter Pastor, Jornell Quiambao, Kanishka Rao, Jarek Rettinghouse, Diego Reyes, Pierre Sermanet, Nicolas Sievers, Clayton Tan, Alexander Toshev, Vincent Vanhoucke, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Mengyuan Yan

Inner Monologue: Embodied Reasoning through Planning with Language Models
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter

Additional References

Large-scale simulation for embodied perception and robot learning, Xia 2021QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation, Kalashnikov et al 2018MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale, Kalashnikov et al 2021ReLMoGen: Leveraging Motion Generation in Reinforcement Learning for Mobile Manipulation, Xia et al 2020Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills, Chebotar et al 2021  Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language, Zeng et al 2022


Episode sponsor: Anyscale

Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.

Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.

2022-08-16
Länk till avsnitt

Sai Krishna Gottipati

Saikrishna Gottipati is an RL Researcher at AI Redefined, working on RL, MARL, human in the loop learning.

Featured References

Cogment: Open Source Framework For Distributed Multi-actor Training, Deployment & Operations
AI Redefined, Sai Krishna Gottipati, Sagar Kurandwad, Clodéric Mars, Gregory Szriftgiser, François Chabot

Do As You Teach: A Multi-Teacher Approach to Self-Play in Deep Reinforcement Learning
Currently under review

Learning to navigate the synthetically accessible chemical space using reinforcement learning
Sai Krishna Gottipati, Boris Sattarov, Sufeng Niu, Yashaswi Pathak, Haoran Wei, Shengchao Liu, Karam J. Thomas, Simon Blackburn, Connor W. Coley, Jian Tang, Sarath Chandar, Yoshua Bengio

Additional References

Asymmetric self-play for automatic goal discovery in robotic manipulation, 2021 OpenAI et al Continuous Coordination As a Realistic Scenario for Lifelong Learning, 2021 Nekoei et al

Episode sponsor: Anyscale

Ray Summit 2022 is coming to San Francisco on August 23-24.
Hear how teams at Dow, Verizon, Riot Games, and more are solving their RL challenges with Ray's RLlib.

Register at raysummit.org and use code RAYSUMMIT22RL for a further 25% off the already reduced prices.

2022-08-01
Länk till avsnitt

Aravind Srinivas 2

Aravind Srinivas is back!  He is now a research Scientist at OpenAI.

Featured References

Decision Transformer: Reinforcement Learning via Sequence Modeling
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, Igor Mordatch

VideoGPT: Video Generation using VQ-VAE and Transformers
Wilson Yan, Yunzhi Zhang, Pieter Abbeel, Aravind Srinivas

2022-05-09
Länk till avsnitt

Rohin Shah

Dr. Rohin Shah is a Research Scientist at DeepMind, and the editor and main contributor of the Alignment Newsletter.

Featured References

The MineRL BASALT Competition on Learning from Human Feedback
Rohin Shah, Cody Wild, Steven H. Wang, Neel Alex, Brandon Houghton, William Guss, Sharada Mohanty, Anssi Kanervisto, Stephanie Milani, Nicholay Topin, Pieter Abbeel, Stuart Russell, Anca Dragan

Preferences Implicit in the State of the World
Rohin Shah, Dmitrii Krasheninnikov, Jordan Alexander, Pieter Abbeel, Anca Dragan

Benefits of Assistance over Reward Learning
Rohin Shah, Pedro Freire, Neel Alex, Rachel Freedman, Dmitrii Krasheninnikov, Lawrence Chan, Michael D Dennis, Pieter Abbeel, Anca Dragan, Stuart Russell

On the Utility of Learning about Humans for Human-AI Coordination
Micah Carroll, Rohin Shah, Mark K. Ho, Thomas L. Griffiths, Sanjit A. Seshia, Pieter Abbeel, Anca Dragan

Evaluating the Robustness of Collaborative Agents
Paul Knott, Micah Carroll, Sam Devlin, Kamil Ciosek, Katja Hofmann, A. D. Dragan, Rohin Shah


Additional References

AGI Safety Fundamentals, EA Cambridge
2022-04-12
Länk till avsnitt

Jordan Terry

Jordan Terry is a PhD candidate at University of Maryland, the maintainer of Gym, the maintainer and creator of PettingZoo and the founder of Swarm Labs.


Featured References

PettingZoo: Gym for Multi-Agent Reinforcement Learning
J. K. Terry, Benjamin Black, Nathaniel Grammel, Mario Jayakumar, Ananth Hari, Ryan Sullivan, Luis Santos, Rodrigo Perez, Caroline Horsch, Clemens Dieffendahl, Niall L. Williams, Yashas Lokesh, Praveen Ravi

PettingZoo on Github

gym on Github


Additional References

Time Limits in Reinforcement Learning, Pardo et al 2017Deep Reinforcement Learning at the Edge of the Statistical Precipice, Agarwal et al 2021
2022-02-22
Länk till avsnitt

Robert Lange

NeurIPS 2021 Political Economy of Reinforcement Learning Systems (PERLS) Workshop

We hear about the idea of PERLS and why its important to talk about.

Political Economy of Reinforcement Learning (PERLS) Workshop at NeurIPS 2021 on Tues Dec 14th NeurIPS 2021
2021-11-19
Länk till avsnitt

Amy Zhang

Amy Zhang is a postdoctoral scholar at UC Berkeley and a research scientist at Facebook AI Research. She will be starting as an assistant professor at UT Austin in Spring 2023. 

Featured References 

Invariant Causal Prediction for Block MDPs 
Amy Zhang, Clare Lyle, Shagun Sodhani, Angelos Filos, Marta Kwiatkowska, Joelle Pineau, Yarin Gal, Doina Precup 

Multi-Task Reinforcement Learning with Context-based Representations 
Shagun Sodhani, Amy Zhang, Joelle Pineau 

MBRL-Lib: A Modular Library for Model-based Reinforcement Learning 
Luis Pineda, Brandon Amos, Amy Zhang, Nathan O. Lambert, Roberto Calandra 


Additional References 

Amy Zhang - Exploring Context for Better Generalization in Reinforcement Learning @ UCL DARK ICML 2020 Poster session: Invariant Causal Prediction for Block MDPs Clare Lyle - Invariant Prediction for Generalization in Reinforcement Learning @ Simons Institute 
2021-09-27
Länk till avsnitt

Xianyuan Zhan

Xianyuan Zhan is currently a research assistant professor at the Institute for AI Industry Research (AIR), Tsinghua University.  He received his Ph.D. degree at Purdue University. Before joining Tsinghua University, Dr. Zhan worked as a researcher at Microsoft Research Asia (MSRA) and a data scientist at JD Technology.  At JD Technology, he led the research that uses offline RL to optimize real-world industrial systems. 

Featured References 

DeepThermal: Combustion Optimization for Thermal Power Generating Units Using Offline Reinforcement Learning
Xianyuan Zhan, Haoran Xu, Yue Zhang, Yusen Huo, Xiangyu Zhu, Honglei Yin, Yu Zheng 

2021-08-30
Länk till avsnitt

Eugene Vinitsky

Eugene Vinitsky is a PhD student at UC Berkeley advised by Alexandre Bayen. He has interned at Tesla and Deepmind.  


Featured References 

A learning agent that acquires social norms from public sanctions in decentralized multi-agent settings 
Eugene Vinitsky, Raphael Köster, John P. Agapiou, Edgar Duéñez-Guzmán, Alexander Sasha Vezhnevets, Joel Z. Leibo 

Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL 
Eugene Vinitsky, Nathan Lichtle, Kanaad Parvate, Alexandre Bayen 

Lagrangian Control through Deep-RL: Applications to Bottleneck Decongestion 
Eugene Vinitsky; Kanaad Parvate; Aboudy Kreidieh; Cathy Wu; Alexandre Bayen 2018 

The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games 
Chao Yu, Akash Velu, Eugene Vinitsky, Yu Wang, Alexandre Bayen, Yi Wu 


Additional References 

SUMO: Simulation of Urban MObility 
2021-08-18
Länk till avsnitt

Jess Whittlestone

Dr. Jess Whittlestone is a Senior Research Fellow at the Centre for the Study of Existential Risk and the Leverhulme Centre for the Future of Intelligence, both at the University of Cambridge. 


Featured References 

The Societal Implications of Deep Reinforcement Learning 
Jess Whittlestone, Kai Arulkumaran, Matthew Crosby 

Artificial Canaries: Early Warning Signs for Anticipatory and Democratic Governance of AI 
Carla Zoe Cremer, Jess Whittlestone 


Additional References 

CogX: Cutting Edge: Understanding AI systems for a better AI policy, featuring Jack Clark and Jess Whittlestone 
2021-07-20
Länk till avsnitt

Aleksandra Faust

Dr Aleksandra Faust is a Staff Research Scientist and Reinforcement Learning research team co-founder at Google Brain Research.

Featured References

Reinforcement Learning and Planning for Preference Balancing Tasks 
Faust 2014

Learning Navigation Behaviors End-to-End with AutoRL
Hao-Tien Lewis Chiang, Aleksandra Faust, Marek Fiser, Anthony Francis

Evolving Rewards to Automate Reinforcement Learning 
Aleksandra Faust, Anthony Francis, Dar Mehta 

Evolving Reinforcement Learning Algorithms 

John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust 


Adversarial Environment Generation for Learning to Navigate the Web 
Izzeddin Gur, Natasha Jaques, Kevin Malta, Manoj Tiwari, Honglak Lee, Aleksandra Faust 

Additional References 

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch, Esteban Real, Chen Liang, David R. So, Quoc V. Le 

 

2021-07-06
Länk till avsnitt

Sam Ritter

Sam Ritter is a Research Scientist on the neuroscience team at DeepMind.

Featured References

Unsupervised Predictive Memory in a Goal-Directed Agent (MERLIN)
Greg Wayne, Chia-Chun Hung, David Amos, Mehdi Mirza, Arun Ahuja, Agnieszka Grabska-Barwinska, Jack Rae, Piotr Mirowski, Joel Z. Leibo, Adam Santoro, Mevlana Gemici, Malcolm Reynolds, Tim Harley, Josh Abramson, Shakir Mohamed, Danilo Rezende, David Saxton, Adam Cain, Chloe Hillier, David Silver, Koray Kavukcuoglu, Matt Botvinick, Demis Hassabis, Timothy Lillicrap

Meta-RL without forgetting:  Been There, Done That: Meta-Learning with Episodic Recall
Samuel Ritter, Jane X. Wang, Zeb Kurth-Nelson, Siddhant M. Jayakumar, Charles Blundell, Razvan Pascanu, Matthew Botvinick

Meta-Reinforcement Learning with Episodic Recall: An Integrative Theory of Reward-Driven Learning 
Samuel Ritter 2019 

Meta-RL exploration and planning: Rapid Task-Solving in Novel Environments 
Sam Ritter, Ryan Faulkner, Laurent Sartran, Adam Santoro, Matt Botvinick, David Raposo 

Synthetic Returns for Long-Term Credit Assignment 
David Raposo, Sam Ritter, Adam Santoro, Greg Wayne, Theophane Weber, Matt Botvinick, Hado van Hasselt, Francis Song 

Additional References 

Sam Ritter: Meta-Learning to Make Smart Inferences from Small Data , North Star AI 2019 The Bitter Lesson, Rich Sutton 2019 
2021-06-21
Länk till avsnitt

Marc G. Bellemare

Professor Marc G. Bellemare is a Research Scientist at Google Research (Brain team), An Adjunct Professor at McGill University, and a Canada CIFAR AI Chair. 

Featured References 

The Arcade Learning Environment: An Evaluation Platform for General Agents 
Marc G. Bellemare, Yavar Naddaf, Joel Veness, Michael Bowling 

Human-level control through deep reinforcement learning 
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A. Rusu, Joel Veness, Marc G. Bellemare, Alex Graves, Martin Riedmiller, Andreas K. Fidjeland, Georg Ostrovski, Stig Petersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg & Demis Hassabis 

Autonomous navigation of stratospheric balloons using reinforcement learning 
Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda & Ziyu Wang 


Additional References 

CAIDA Talk: A tour of distributional reinforcement learning November 18, 2020 - Marc G. Bellemare Amii AI Seminar Series:  Autonomous nav of stratospheric balloons using RL, Marlos C. Machado UMD RLSS | Marc Bellemare | A History of Reinforcement Learning: Atari to Stratospheric Balloons TalkRL: Marlos C. Machado, Dr. Machado also spoke to us about various aspects of ALE and Project Loon in depth Hyperbolic discounting and learning over multiple horizons, Fedus et al 2019 Marc G. Bellemare on Twitter 
2021-05-13
Länk till avsnitt

Robert Osazuwa Ness

Robert Osazuwa Ness is an adjunct professor of computer science at Northeastern University, an ML Research Engineer at Gamalon, and the founder of AltDeep School of AI.  He holds a PhD in statistics.  He studied at Johns Hopkins SAIS and then Purdue University. 


References 

Altdeep School of AI, Altdeep on Twitch, Substack, Robert Ness Altdeep Causal Generative Machine Learning Minicourse, Free course Robert Osazuwa Ness on Google Scholar Gamalon Inc Causal Reinforcement Learning talks, Elias Bareinboim The Bitter Lesson, Rich Sutton 2019 The Need for Biases in Learning Generalizations, Tom Mitchell 1980 Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics, Kansky et al 2017 
2021-05-08
Länk till avsnitt

Marlos C. Machado

Dr. Marlos C. Machado is a research scientist at DeepMind and an adjunct professor at the University of Alberta. He holds a PhD from the University of Alberta and a MSc and BSc from UFMG, in Brazil. 


Featured References 

Revisiting the Arcade Learning Environment: Evaluation Protocols and Open Problems for General Agents 
Marlos C. Machado, Marc G. Bellemare, Erik Talvitie, Joel Veness, Matthew J. Hausknecht, Michael Bowling 

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning [ video
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G. Bellemare 

Efficient Exploration in Reinforcement Learning through Time-Based Representations 
Marlos C. Machado 

A Laplacian Framework for Option Discovery in Reinforcement Learning [ video
Marlos C. Machado, Marc G. Bellemare, Michael H. Bowling 

Eigenoption Discovery through the Deep Successor Representation 
Marlos C. Machado, Clemens Rosenbaum, Xiaoxiao Guo, Miao Liu, Gerald Tesauro, Murray Campbell 

Exploration in Reinforcement Learning with Deep Covering Options 
Yuu Jinnai, Jee Won Park, Marlos C. Machado, George Dimitri Konidaris 

Autonomous navigation of stratospheric balloons using reinforcement learning 
Marc G. Bellemare, Salvatore Candido, Pablo Samuel Castro, Jun Gong, Marlos C. Machado, Subhodeep Moitra, Sameera S. Ponda & Ziyu Wang 

Generalization and Regularization in DQN 
Jesse Farebrother, Marlos C. Machado, Michael Bowling 


Additional References 

Amii AI Seminar Series: Marlos C. Machado - Autonomous navigation of stratospheric balloons using RL State of the Art Control of Atari Games Using Shallow Reinforcement Learning, Liang et al Introspective Agents: Confidence Measures for General Value Functions, Sherstan et al 
2021-04-12
Länk till avsnitt

Nathan Lambert

Nathan Lambert is a PhD Candidate at UC Berkeley. 

Featured References 

Learning Accurate Long-term Dynamics for Model-based Reinforcement Learning 
Nathan O. Lambert, Albert Wilcox, Howard Zhang, Kristofer S. J. Pister, Roberto Calandra 

Objective Mismatch in Model-based Reinforcement Learning 
Nathan Lambert, Brandon Amos, Omry Yadan, Roberto Calandra 

Low Level Control of a Quadrotor with Deep Model-Based Reinforcement Learning 
Nathan O. Lambert, Daniel S. Drew, Joseph Yaconelli, Roberto Calandra, Sergey Levine, Kristofer S.J. Pister 

On the Importance of Hyperparameter Optimization for Model-based Reinforcement Learning 
Baohe Zhang, Raghu Rajan, Luis Pineda, Nathan Lambert, André Biedenkapp, Kurtland Chua, Frank Hutter, Roberto Calandra 


Additional References 

Nathan Lambert's blog Nathan Lambert on Google scholar 
2021-03-22
Länk till avsnitt

Kai Arulkumaran

Kai Arulkumaran is a researcher at Araya in Tokyo. 

Featured References 

AlphaStar: An Evolutionary Computation Perspective 
Kai Arulkumaran, Antoine Cully, Julian Togelius 

Analysing Deep Reinforcement Learning Agents Trained with Domain Randomisation 
Tianhong Dai, Kai Arulkumaran, Tamara Gerbert, Samyakh Tukra, Feryal Behbahani, Anil Anthony Bharath 

Training Agents using Upside-Down Reinforcement Learning 
Rupesh Kumar Srivastava, Pranav Shyam, Filipe Mutz, Wojciech Ja?kowski, Jürgen Schmidhuber 


Additional References 

Araya NNAISENSE Kai Arulkumaran on Google Scholar https://github.com/Kaixhin/rlenvs https://github.com/Kaixhin/Atari https://github.com/Kaixhin/Rainbow Tschiatschek, S., Arulkumaran, K., Stühmer, J. & Hofmann, K. (2018). Variational Inference for Data-Efficient Model Learning in POMDPs. arXiv:1805.09281. Arulkumaran, K., Dilokthanakul, N., Shanahan, M. & Bharath, A. A. (2016). Classifying Options for Deep Reinforcement Learning. International Joint Conference on Artificial Intelligence, Deep Reinforcement Learning Workshop. Garnelo, M., Arulkumaran, K. & Shanahan, M. (2016). Towards Deep Symbolic Reinforcement Learning. Annual Conference on Neural Information Processing Systems, Deep Reinforcement Learning Workshop. Arulkumaran, K., Deisenroth, M. P., Brundage, M. & Bharath, A. A. (2017). Deep reinforcement learning: A brief survey. IEEE Signal Processing Magazine. Agostinelli, A., Arulkumaran, K., Sarrico, M., Richemond, P. & Bharath, A. A. (2019). Memory-Efficient Episodic Control Reinforcement Learning with Dynamic Online k-means. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning. Sarrico, M., Arulkumaran, K., Agostinelli, A., Richemond, P. & Bharath, A. A. (2019). Sample-Efficient Reinforcement Learning with Maximum Entropy Mellowmax Episodic Control. Annual Conference on Neural Information Processing Systems, Workshop on Biological and Artificial Reinforcement Learning. 
2021-03-16
Länk till avsnitt

Michael Dennis

Michael Dennis is a PhD student at the Center for Human-Compatible AI at UC Berkeley, supervised by Professor Stuart Russell

I'm interested in robustness in RL and multi-agent RL, specifically as it applies to making the interaction between AI systems and society at large to be more beneficial.   

--Michael Dennis 


Featured References

Emergent Complexity and Zero-shot Transfer via Unsupervised Environment Design [PAIRED]
Michael Dennis, Natasha Jaques, Eugene Vinitsky, Alexandre Bayen, Stuart Russell, Andrew Critch, Sergey Levine
Videos

Adversarial Policies: Attacking Deep Reinforcement Learning 

Adam Gleave, Michael Dennis, Cody Wild, Neel Kant, Sergey Levine, Stuart Russell
Homepage and Videos

Accumulating Risk Capital Through Investing in Cooperation
Charlotte Roman, Michael Dennis, Andrew Critch, Stuart Russell 


Quantifying Differences in Reward Functions [EPIC]
Adam Gleave, Michael Dennis, Shane Legg, Stuart Russell, Jan Leike


Additional References 

Safe Opponent Exploitation, Sam Ganzfried And Tuomas Sandholm 2015 Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning, Natasha Jaques et al 2019 Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research, Leibo et al 2019 Leveraging Procedural Generation to Benchmark Reinforcement Learning, Karl Cobbe et al 2019 Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions, Wang et al 2019 Consequences of Misaligned AI, Zhuang et al 2020 Conservative Agency via Attainable Utility Preservation, Turner et al 2019 
2021-01-26
Länk till avsnitt

Shimon Whiteson

Shimon Whiteson is a Professor of Computer Science at Oxford University, the head of WhiRL, the Whiteson Research Lab at Oxford, and Head of Research at Waymo UK. 


Featured References 

VariBAD: A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning 
Luisa Zintgraf, Kyriacos Shiarlis, Maximilian Igl, Sebastian Schulze, Yarin Gal, Katja Hofmann, Shimon Whiteson 

Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning 
Tabish Rashid, Mikayel Samvelyan, Christian Schroeder de Witt, Gregory Farquhar, Jakob Foerster, Shimon Whiteson 


Additional References 

Shimon Whiteson - Multi-agent RL, MIT Embodied Intelligence Seminar The StarCraft Multi-Agent Challenge, Samvelyan et al 2019 Direct Policy Transfer with Hidden Parameter Markov Decision Processes, Yao et al  2018 Value-Decomposition Networks For Cooperative Multi-Agent Learning, Sunehag et al 2017 Whiteson Research Lab Waymo acquires Latent Logic to accelerate progress towards safe, driverless vehicles, Oxford News Waymo 
2020-12-06
Länk till avsnitt

Aravind Srinivas

Aravind Srinivas is a 3rd year PhD student at UC Berkeley advised by Prof. Abbeel. 
He co-created and co-taught a grad course on Deep Unsupervised Learning at Berkeley. 


Featured References 

Data-Efficient Image Recognition with Contrastive Predictive Coding 
Olivier J. Hénaff, Aravind Srinivas, Jeffrey De Fauw, Ali Razavi, Carl Doersch, S. M. Ali Eslami, Aaron van den Oord 

Contrastive Unsupervised Representations for Reinforcement Learning 
Aravind Srinivas, Michael Laskin, Pieter Abbeel 

Reinforcement Learning with Augmented Data 
Michael Laskin, Kimin Lee, Adam Stooke, Lerrel Pinto, Pieter Abbeel, Aravind Srinivas 

SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning 
Kimin Lee, Michael Laskin, Aravind Srinivas, Pieter Abbeel 


Additional References 

CS294-158-SP20 Deep Unsupervised Learning, Berkeley Phasic Policy Gradient, Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman Bootstrap your own latent: A new approach to self-supervised Learning , Grill et al 2020 
2020-09-21
Länk till avsnitt

Taylor Killian

Taylor Killian is a Ph.D. student at the University of Toronto and the Vector Institute, and an Intern at Google Brain.

Featured References 

Direct Policy Transfer with Hidden Parameter Markov Decision Processes
Yao, Killian, Konidaris, Doshi-Velez 

Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes
Killian, Daulton, Konidaris, Doshi-Velez 

Transfer Learning Across Patient Variations with Hidden Parameter Markov Decision Processes
Killian, Konidaris, Doshi-Velez 

Counterfactually Guided Policy Transfer in Clinical Settings
Killian, Ghassemi, Joshi 


Additional References 

Hidden Parameter Markov Decision Processes: A Semiparametric Regression Approach for Discovering Latent Task Parametrizations, Doshi-Velez, Konidaris Mimic III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman L, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care, Komorowski et al 

2020-08-17
Länk till avsnitt

Nan Jiang

Nan Jiang is an Assistant Professor of Computer Science at University of Illinois.  He was a Postdoc Microsoft Research, and did his PhD at University of Michigan under Professor Satinder Singh. 


Featured References 

Reinforcement Learning: Theory and Algorithms
Alekh Agarwal Nan Jiang Sham M. Kakade 

Model-based RL in Contextual Decision Processes: PAC bounds and Exponential Improvements over Model-free Approaches
Wen Sun, Nan Jiang, Akshay Krishnamurthy, Alekh Agarwal, John Langford 

Information-Theoretic Considerations in Batch Reinforcement Learning
Jinglin Chen, Nan Jiang 

 
Additional References 

Towards a Unified Theory of State Abstraction for MDPs, Lihong Li, Thomas J. Walsh, Michael L. Littman  Doubly Robust Off-policy Value Evaluation for Reinforcement Learning, Nan Jiang, Lihong Li Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization, Nan Jiang, Jiawei Huang Empirical Study of Off-Policy Policy Evaluation for Reinforcement Learning, Cameron Voloshin, Hoang M. Le, Nan Jiang, Yisong Yue 

Errata 

[Robin] I misspoke when I said in domain randomization we want the agent to "ignore" domain parameters.  What I should have said is, we want the agent to perform well within some range of domain parameters, it should be robust with respect to domain parameters. 
2020-07-06
Länk till avsnitt

Danijar Hafner

Danijar Hafner is a PhD student at the University of Toronto, and a student researcher at Google Research, Brain Team and the Vector Institute.  He holds a Masters of Research from University College London. 

Featured References 

A deep learning framework for neuroscience
Blake A. Richards, Timothy P. Lillicrap , Philippe Beaudoin, Yoshua Bengio, Rafal Bogacz, Amelia Christensen, Claudia Clopath, Rui Ponte Costa, Archy de Berker, Surya Ganguli, Colleen J. Gillon , Danijar Hafner, Adam Kepecs, Nikolaus Kriegeskorte, Peter Latham , Grace W. Lindsay, Kenneth D. Miller , Richard Naud , Christopher C. Pack, Panayiota Poirazi , Pieter Roelfsema , João Sacramento, Andrew Saxe, Benjamin Scellier, Anna C. Schapiro , Walter Senn, Greg Wayne, Daniel Yamins, Friedemann Zenke, Joel Zylberberg, Denis Therien, Konrad P. Kording Learning Latent Dynamics for Planning from Pixels
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson Dream to Control: Learning Behaviors by Latent Imagination
Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Planning to Explore via Self-Supervised World Models
Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak 

Additional References

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model Schrittwieser et al Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm Silver et al Shaping Belief States with Generative Environment Models for RL  Gregor et al Model-Based Active Exploration Shyam et al 

 
Errata 

[Robin] Around 1:37 I say "some ... world models get confused by random noise". I meant "some curiosity formulations", not "world models" 
2020-05-14
Länk till avsnitt

Csaba Szepesvari

Csaba Szepesvari is: 

Head of the Foundations Team at DeepMind Professor of Computer Science at the University of Alberta Canada CIFAR AI Chair Fellow at the Alberta Machine Intelligence Institute  Co-Author of the book Bandit Algorithms along with Tor Lattimore, and author of the book Algorithms for Reinforcement Learning 

References 

Bandit based monte-carlo planning, Levente Kocsis, Csaba Szepesvári Bandit Algorithms, Tor Lattimore, Csaba Szepesvári Algorithms for Reinforcement Learning, Csaba Szepesvári The Predictron: End-To-End Learning and Planning, David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto, Thomas Degris A Bayesian framework for reinforcement learning, Strens Solving Rubik?s Cube with a Robot Hand ; Paper, OpenAI, Ilge Akkaya, Marcin Andrychowicz, Maciek Chociej, Mateusz Litwin, Bob McGrew, Arthur Petron, Alex Paino, Matthias Plappert, Glenn Powell, Raphael Ribas, Jonas Schneider, Nikolas Tezak, Jerry Tworek, Peter Welinder, Lilian Weng, Qiming Yuan, Wojciech Zaremba, Lei Zhang The Nonstochastic Multiarmed Bandit Problem, Peter Auer, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire Deep Learning with Bayesian Principles, Mohammad Emtiyaz Khan Tackling climate change with Machine Learning David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio 
2020-04-05
Länk till avsnitt

Ben Eysenbach

Ben Eysenbach is a PhD student in the Machine Learning Department at Carnegie Mellon University.  He was a Resident at Google Brain, and studied math and computer science at MIT. He co-founded the ICML Exploration in Reinforcement Learning workshop

Featured References

Diversity is All You Need: Learning Skills without a Reward Function
Benjamin Eysenbach, Abhishek Gupta, Julian Ibarz, Sergey Levine

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Additional References 

Behaviour Suite for Reinforcement Learning, Ian Osband, Yotam Doron, Matteo Hessel, John Aslanides, Eren Sezener, Andre Saraiva, Katrina McKinney, Tor Lattimore, Csaba Szepesvari, Satinder Singh, Benjamin Van Roy, Richard Sutton, David Silver, Hado Van Hasselt Learning Latent Plans from Play, Corey Lynch, Mohi Khansari, Ted Xiao, Vikash Kumar, Jonathan Tompson, Sergey Levine, Pierre Sermanet Finale Doshi-Velez Emma Brunskill Closed-loop optimization of fast-charging protocols for batteries with machine learning,  Peter Attia, Aditya Grover, Norman Jin, Kristen Severson, Todor Markov, Yang-Hung Liao, Michael Chen, Bryan Cheong, Nicholas Perkins, Zi Yang, Patrick Herring, Muratahan Aykol, Stephen Harris, Richard Braatz, Stefano Ermon, William Chueh CMU 10-703 Deep Reinforcement Learning, Fall 2019, Carnegie Mellon University ICML Exploration in Reinforcement Learning workshop 
2020-03-30
Länk till avsnitt

NeurIPS 2019 Deep RL Workshop

Thank you to all the presenters that participated.  I covered as many as I could given the time and crowds, if you were not included and wish to be, please email [email protected] 

More details on the official NeurIPS Deep RL Workshop site

0:23  Approximating two value functions instead of one: towards characterizing a new family of Deep Reinforcement Learning algorithms; Matthia Sabatelli (University of Liege); Gilles Louppe (University of Liège); Pierre Geurts (University of Liège); Marco Wiering (University of Groningen) [external pdf link] 4:16  Single Deep Counterfactual Regret Minimization; Eric Steinberger (University of Cambridge). 5:38  On the Convergence of Episodic Reinforcement Learning Algorithms at the Example of RUDDER; Markus Holzleitner (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria); José Arjona-Medina (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria); Marius-Constantin Dinu (LIT AI Lab / University Linz ); Sepp Hochreiter (LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria). 9:33  Objective Mismatch in Model-based Reinforcement Learning; Nathan Lambert (UC Berkeley); Brandon Amos (Facebook); Omry Yadan (Facebook); Roberto Calandra (Facebook). 10:51  Option Discovery using Deep Skill Chaining; Akhil Bagaria (Brown University); George Konidaris (Brown University). 13:44  Blue River Controls: A toolkit for Reinforcement Learning Control Systems on Hardware; Kirill Polzounov (University of Calgary); Ramitha Sundar (Blue River Technology); Lee Reden (Blue River Technology). 14:52  LeDeepChef: Deep Reinforcement Learning Agent for Families of Text-Based Games; Leonard Adolphs (ETHZ); Thomas Hofmann (ETH Zurich). 16:30  Accelerating Training in Pommerman with Imitation and Reinforcement Learning; Hardik Meisheri (TCS Research); Omkar Shelke (TCS Research); Richa Verma (TCS Research); Harshad Khadilkar (TCS Research). 17:27  Dream to Control: Learning Behaviors by Latent Imagination; Danijar Hafner (Google); Timothy Lillicrap (DeepMind); Jimmy Ba (University of Toronto); Mohammad Norouzi (Google Brain) [external pdf link]. 20:48  Adaptive Temperature Tuning for Mellowmax in Deep Reinforcement Learning; Seungchan Kim (Brown University); George Konidaris (Brown). 22:05  Meta-learning curiosity algorithms; Ferran Alet (MIT); Martin Schneider (MIT); Tomas Lozano-Perez (MIT); Leslie Kaelbling (MIT). 24:09  Predictive Coding for Boosting Deep Reinforcement Learning with Sparse Rewards; Xingyu Lu (Berkeley); Stas Tiomkin (BAIR, UC Berkeley); Pieter Abbeel (UC Berkeley). 25:44   Swarm-inspired Reinforcement Learning via Collaborative Inter-agent Knowledge Distillation; Zhang-Wei Hong (Preferred Networks); Prabhat Nagarajan (Preferred Networks); Guilherme Maeda (Preferred Networks). 26:35  Multiplayer AlphaZero; Nicholas Petosa (Georgia Institute of Technology); Tucker Balch (Ga Tech) [external pdf link]. 27:43  Prioritized Sequence Experience Replay; Marc Brittain (Iowa State University); Joshua Bertram (Iowa State University); Xuxi Yang (Iowa State University); Peng Wei (Iowa State University) [external pdf link]. 29:14  Recurrent neural-linear posterior sampling for non-stationary bandits; Paulo Rauber (IDSIA); Aditya Ramesh (USI); Jürgen Schmidhuber (IDSIA - Lugano). 29:36  Improving Evolutionary Strategies With Past Descent Directions; Asier Mujika (ETH Zurich); Florian Meier (ETH Zurich); Marcelo Matheus Gauy (ETH Zurich); Angelika Steger (ETH Zurich) [external pdf link]. 31:40  ZPD Teaching Strategies for Deep Reinforcement Learning from Demonstrations; Daniel Seita (University of California, Berkeley); David Chan (University of California, Berkeley); Roshan Rao (UC Berkeley); Chen Tang (UC Berkeley); Mandi Zhao (UC Berkeley); John Canny (UC Berkeley) [external pdf link]. 33:05  Bottom-Up Meta-Policy Search; Luckeciano Melo (Aeronautics Institute of Technology); Marcos Máximo (Aeronautics Institute of Technology); Adilson Cunha (Aeronautics Institute of Technology) [external pdf link]. 33:37  MERL: Multi-Head Reinforcement Learning; Yannis Flet-Berliac (University of Lille / Inria); Philippe Preux (INRIA) [external pdf link]. 35:30  Emergen...
2019-12-20
Länk till avsnitt

Scott Fujimoto

Scott Fujimoto is a PhD student at McGill University and Mila. He is the author of TD3 as well as some of the recent developments in batch deep reinforcement learning.  

Featured References 

Addressing Function Approximation Error in Actor-Critic Methods 
Scott Fujimoto, Herke van Hoof, David Meger 

Off-Policy Deep Reinforcement Learning without Exploration 

Scott Fujimoto, David Meger, Doina Precup 

Benchmarking Batch Deep Reinforcement Learning Algorithms 

Scott Fujimoto, Edoardo Conti, Mohammad Ghavamzadeh, Joelle Pineau 


Additional References 

Striving for Simplicity in Off-Policy Deep Reinforcement Learning 
Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor 
Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine Way Off-Policy Batch Deep Reinforcement Learning of Implicit Human Preferences in Dialog 
Natasha Jaques, Asma Ghandeharioun, Judy Hanwen Shen, Craig Ferguson, Agata Lapedriza, Noah Jones, Shixiang Gu, Rosalind Picard Continuous control with deep reinforcement learning 
Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra Distributed Distributional Deterministic Policy Gradients 
Gabriel Barth-Maron, Matthew W. Hoffman, David Budden, Will Dabney, Dan Horgan, Dhruva TB, Alistair Muldal, Nicolas Heess, Timothy Lillicrap 
2019-11-19
Länk till avsnitt

Jessica Hamrick

Dr. Jessica Hamrick is a Research Scientist at DeepMind. She holds a PhD in Psychology from UC Berkeley. 


Featured References 

Structured agents for physical construction 
Victor Bapst, Alvaro Sanchez-Gonzalez, Carl Doersch, Kimberly L. Stachenfeld, Pushmeet Kohli, Peter W. Battaglia, Jessica B. Hamrick 

Analogues of mental simulation and imagination in deep learning 

Jessica Hamrick 

Additional References 

Metacontrol for Adaptive Imagination-Based Optimization 
Jessica B. Hamrick, Andrew J. Ballard, Razvan Pascanu, Oriol Vinyals, Nicolas Heess, Peter W. Battaglia Surprising Negative Results for Generative Adversarial Tree Search  
Kamyar Azizzadenesheli, Brandon Yang, Weitang Liu, Zachary C Lipton, Animashree Anandkumar Metareasoning and Mental Simulation 
Jessica B. Hamrick Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm 
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis Object-oriented state editing for HRL 
Victor Bapst, Alvaro Sanchez-Gonzalez, Omar Shams, Kimberly Stachenfeld, Peter W. Battaglia, Satinder Singh, Jessica B. Hamrick FeUdal Networks for Hierarchical Reinforcement Learning 
Alexander Sasha Vezhnevets, Simon Osindero, Tom Schaul, Nicolas Heess, Max Jaderberg, David Silver, Koray Kavukcuoglu PILCO: A Model-Based and Data-Efficient Approach to Policy Search 
Marc Peter Deisenroth, Carl Edward Rasmussen Blueberry Earth 
Anders Sandberg 
2019-11-12
Länk till avsnitt

Pablo Samuel Castro

Dr Pablo Samuel Castro is a Staff Research Software Engineer at Google Brain.  He is the main author of the Dopamine RL framework


Featured References 

A Comparative Analysis of Expected and Distributional Reinforcement Learning 

Clare Lyle, Pablo Samuel Castro, Marc G. Bellemare  


A Geometric Perspective on Optimal Representations for Reinforcement Learning 

Marc G. Bellemare, Will Dabney, Robert Dadashi, Adrien Ali Taiga, Pablo Samuel Castro, Nicolas Le Roux, Dale Schuurmans, Tor Lattimore, Clare Lyle 


Dopamine: A Research Framework for Deep Reinforcement Learning 
Pablo Samuel Castro, Subhodeep Moitra, Carles Gelada, Saurabh Kumar, Marc G. Bellemare 

Dopamine RL framework on github 
 

Tensorflow Agents on github 

Additional References 

Using Linear Programming for Bayesian Exploration in Markov Decision Processes 
Pablo Samuel Castro, Doina Precup Using bisimulation for policy transfer in MDPs 
Pablo Samuel Castro, Doina Precup Rainbow: Combining Improvements in Deep Reinforcement Learning 
Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver Implicit Quantile Networks for Distributional Reinforcement Learning 
Will Dabney, Georg Ostrovski, David Silver, Rémi Munos A Distributional Perspective on Reinforcement Learning 
Marc G. Bellemare, Will Dabney, Rémi Munos 
2019-10-10
Länk till avsnitt

Kamyar Azizzadenesheli

Dr. Kamyar Azizzadenesheli is a post-doctorate scholar at Caltech.  His research interest is mainly in the area of Machine Learning, from theory to practice, with the main focus in Reinforcement Learning.  He will be joining Purdue University as an Assistant CS Professor in Fall 2020. 

Featured References 

Efficient Exploration through Bayesian Deep Q-Networks 
Kamyar Azizzadenesheli, Animashree Anandkumar 

Surprising Negative Results for Generative Adversarial Tree Search 
Kamyar Azizzadenesheli, Brandon Yang, Weitang Liu, Zachary C Lipton, Animashree Anandkumar 

Maybe a few considerations in Reinforcement Learning Research? 
Kamyar Azizzadenesheli 
 

Additional References 

Model-Based Reinforcement Learning for Atari  
Lukasz Kaiser, Mohammad Babaeizadeh, Piotr Milos, Blazej Osinski, Roy H Campbell, Konrad Czechowski, Dumitru Erhan, Chelsea Finn, Piotr Kozakowski, Sergey Levine, Afroz Mohiuddin, Ryan Sepassi, George Tucker, Henryk Michalewski Near-optimal Regret Bounds for Reinforcement Learning 
Thomas Jaksch, Ronald Ortner, Peter Auer Curious Model-Building Control Systems 
Jürgen Schmidhuber Rainbow: Combining Improvements in Deep Reinforcement Learning  
Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics 
Ken Kansky, Tom Silver, David A. Mély, Mohamed Eldawy, Miguel Lázaro-Gredilla, Xinghua Lou, Nimrod Dorfman, Szymon Sidor, Scott Phoenix, Dileep George Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm 
David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis 
2019-09-21
Länk till avsnitt

Antonin Raffin and Ashley Hill

Antonin Raffin is a researcher at the German Aerospace Center (DLR) in Munich, working in the Institute of Robotics and Mechatronics. His research is on using machine learning for controlling real robots (because simulation is not enough), with a particular interest for reinforcement learning. 


Ashley Hill is doing his thesis on improving control algorithms using machine learning for real time gain tuning. 

He works mainly with neuroevolution, genetic algorithms, and of course reinforcement learning, applied to mobile robots.  He holds a masters degree in Machine learning, and a bachelors in Computer science from the Université Paris-Saclay. 

Featured References 

stable-baselines on github 
Ashley Hill, Antonin Raffin primary authors. 

S-RL Toolbox 
Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat 

Decoupling feature extraction from policy learning: assessing benefits of state representation learning in goal based robotics 
Antonin Raffin, Ashley Hill, René Traoré, Timothée Lesort, Natalia Díaz-Rodríguez, David Filliat 


Additional References 

Learning to Drive Smoothly in Minutes, Antonin Raffin Multimodal SRL (best paper at ICRA): Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal  Representations for Contact-Rich Tasks,  Michelle A. Lee, Yuke Zhu, Krishnan Srinivasan, Parth Shah, Silvio Savarese, Li Fei-Fei, Animesh Garg, Jeannette Bohg Benchmarking Model-Based Reinforcement Learning, Tingwu Wang, Xuchan Bao, Ignasi Clavera, Jerrick Hoang, Yeming Wen, Eric Langlois, Shunshi Zhang, Guodong Zhang, Pieter Abbeel, Jimmy Ba TossingBot: Learning to Throw Arbitrary Objects with Residual Physics 
Andy Zeng, Shuran Song, Johnny Lee, Alberto Rodriguez, Thomas Funkhouser Stable Baselines roadmap OpenAI baselines stable-baselines github pull request 
2019-09-05
Länk till avsnitt

Michael Littman

Michael L Littman is a professor of Computer Science at Brown University.  He was elected ACM Fellow in 2018 "For contributions to the design and analysis of sequential decision making algorithms in artificial intelligence". 

Featured References 

Convergent Actor Critic by Humans 
James MacGlashan, Michael L. Littman, David L. Roberts, Robert Tyler Loftin, Bei Peng, Matthew E. Taylor 

People teach with rewards and punishments as communication, not reinforcements 
Mark Ho, Fiery Cushman, Michael L. Littman, Joseph Austerweil 

Theory of Minds: Understanding Behavior in Groups Through Inverse Planning 
Michael Shum, Max Kleiman-Weiner, Michael L. Littman, Joshua B. Tenenbaum 

Personalized education at scale 
Saarinen, Cater, Littman 

Additional References 

Michael Littman papers on Google Scholar, Semantic Scholar Reinforcement Learning on Udacity, Charles Isbell, Michael Littman, Chris Pryby  Machine Learning on Udacity, Michael Littman, Charles Isbell, Pushkar Kolhe  Temporal Difference Learning and TD-Gammon, Gerald Tesauro Playing Atari with Deep Reinforcement Learning, Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, Martin Riedmiller Ask Me Anything about MOOCs, D Fisher, C Isbell, ML Littman, M Wollowski, et al Reinforcement Learning and Decision Making (RLDM) Conference Algorithms for Sequential Decision Making, Michael Littman's Thesis Machine Learning A Cappella - Overfitting Thriller!, Michael Littman and Charles Isbell feat Infinite Harmony Turbotax Ad 2016: Genius Anna/Michael Littman 
2019-08-24
Länk till avsnitt

Natasha Jaques

Natasha Jaques is a PhD candidate at MIT working on affective and social intelligence.  She has interned with DeepMind and Google Brain, and was an OpenAI Scholars mentor.  Her paper ?Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning? received an honourable mention for best paper at ICML 2019. 

Featured References 

Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning
Natasha Jaques, Angeliki Lazaridou, Edward Hughes, Caglar Gulcehre, Pedro A. Ortega, DJ Strouse, Joel Z. Leibo, Nando de Freitas

Tackling climate change with Machine Learning
David Rolnick, Priya L. Donti, Lynn H. Kaack, Kelly Kochanski, Alexandre Lacoste, Kris Sankaran, Andrew Slavin Ross, Nikola Milojevic-Dupont, Natasha Jaques, Anna Waldman-Brown, Alexandra Luccioni, Tegan Maharaj, Evan D. Sherwin, S. Karthik Mukkavilli, Konrad P. Kording, Carla Gomes, Andrew Y. Ng, Demis Hassabis, John C. Platt, Felix Creutzig, Jennifer Chayes, Yoshua Bengio 

Additional References 

MIT Media Lab Flight Offsets,  Caroline Jaffe, Juliana Cherston, Natasha Jaques Modeling Others using Oneself in Multi-Agent Reinforcement Learning
Roberta Raileanu, Emily Denton, Arthur Szlam, Rob Fergus Inequity aversion improves cooperation in intertemporal social dilemmas,  
Edward Hughes, Joel Z. Leibo, Matthew G. Phillips, Karl Tuyls, Edgar A. Duéñez-Guzmán, Antonio García Castañeda, Iain Dunning, Tina Zhu, Kevin R. McKee, Raphael Koster, Heather Roff, Thore Graepel Sequential Social Dilemma Games on github, Eugene Vinitsky, Natasha Jaques  AI Alignment newsletter, Rohin Shah Paired Open-Ended Trailblazer (POET): Endlessly Generating Increasingly Complex and Diverse Learning Environments and Their Solutions, Rui Wang, Joel Lehman, Jeff Clune, Kenneth O. Stanley The social function of intellect, Nicholas Humphrey Autocurricula and the Emergence of Innovation from Social Interaction: A Manifesto for Multi-Agent Intelligence Research, Joel Z. Leibo, Edward Hughes, Marc Lanctot, Thore Graepel A Recipe for Training Neural Networks, Andrej Karpathy Emotionally Adaptive Intelligent Tutoring Systems using POMDPs, Natasha Jaques Sapiens, Yuval Noah Harari 

2019-08-10
Länk till avsnitt
Hur lyssnar man på podcast?

En liten tjänst av I'm With Friends. Finns även på engelska.
Uppdateras med hjälp från iTunes.