The space of AI alignment research is highly dynamic, and it's often difficult to get a bird's eye view of the landscape. This podcast is the second of two parts attempting to partially remedy this by providing an overview of technical AI alignment efforts. In particular, this episode seeks to continue the discussion from Part 1 by going in more depth with regards to the specific approaches to AI alignment. In this podcast, Lucas spoke with Rohin Shah. Rohin is a 5th year PhD student at UC Berkeley with the Center for Human-Compatible AI, working with Anca Dragan, Pieter Abbeel and Stuart Russell. Every week, he collects and summarizes recent progress relevant to AI alignment in the Alignment Newsletter.
Topics discussed in this episode include:
-Embedded agency
-The field of "getting AI systems to do what we want"
-Ambitious value learning
-Corrigibility, including iterated amplification, debate, and factored cognition
-AI boxing and impact measures
-Robustness through verification, adverserial ML, and adverserial examples
-Interpretability research
-Comprehensive AI Services
-Rohin's relative optimism about the state of AI alignment
You can take a short (3 minute) survey to share your feedback about the podcast here: https://www.surveymonkey.com/r/YWHDFV7