Neel Nanda joins the podcast to talk about mechanistic interpretability and how it can make AI safer. Neel is an independent AI safety researcher. You can find his blog here: https://www.neelnanda.io
Timestamps:
00:00 Introduction
00:46 How early is the field mechanistic interpretability?
03:12 Why should we care about mechanistic interpretability?
06:38 What are some successes in mechanistic interpretability?
16:29 How promising is mechanistic interpretability?
31:13 Is machine learning analogous to evolution?
32:58 How does mechanistic interpretability make AI safer?
36:54 36:54 Does mechanistic interpretability help us control AI?
39:57 Will AI models resist interpretation?
43:43 Is mechanistic interpretability fast enough?
54:10 Does mechanistic interpretability give us a general understanding?
57:44 How can you help with mechanistic interpretability?
Social Media Links:
➡️ WEBSITE: https://futureoflife.org
➡️ TWITTER: https://twitter.com/FLIxrisk
➡️ INSTAGRAM: https://www.instagram.com/futureoflifeinstitute/
➡️ META: https://www.facebook.com/futureoflifeinstitute
➡️ LINKEDIN: https://www.linkedin.com/company/future-of-life-institute/