What role does inverse reinforcement learning (IRL) have to play in AI alignment? What issues complicate IRL and how does this affect the usefulness of this preference learning methodology? What sort of paradigm of AI alignment ought we to take up given such concerns?
Inverse Reinforcement Learning and the State of AI Alignment with Rohin Shah is the seventh podcast in the AI Alignment Podcast series, hosted by Lucas Perry. For those of you that are new, this series is covering and exploring the AI alignment problem across a large variety of domains, reflecting the fundamentally interdisciplinary nature of AI alignment. Broadly, we will be having discussions with technical and non-technical researchers across areas such as machine learning, governance, ethics, philosophy, and psychology as they pertain to the project of creating beneficial AI. If this sounds interesting to you, we hope that you will join in the conversations by following us or subscribing to our podcasts on Youtube, SoundCloud, or your preferred podcast site/application.
In this podcast, Lucas spoke with Rohin Shah. Rohin is a 5th year PhD student at UC Berkeley with the Center for Human-Compatible AI, working with Anca Dragan, Pieter Abbeel and Stuart Russell. Every week, he collects and summarizes recent progress relevant to AI alignment in the Alignment Newsletter.
Topics discussed in this episode include:
- The role of systematic bias in IRL
- The metaphilosophical issues of IRL
- IRL's place in preference learning
- Rohin's take on the state of AI alignment
- What Rohin has changed his mind about