TL;DR: The Google DeepMind AGI Safety team is hiring for Applied Interpretability research scientists and engineers. Applied Interpretability is a new subteam we are forming to focus on directly using model internals-based techniques to make models safer in production. Achieving this goal will require doing research on the critical path that enables interpretability methods to be more widely used for practical problems. We believe this has significant direct and indirect benefits for preventing AGI x-risk, and argue this below. Our ideal candidate has experience with ML engineering and some hands-on experience with language model interpretability research. To apply for this role (as well as other open AGI Safety and Gemini Safety roles), follow the links for Research Engineers here & Research Scientists here.
1. What is Applied Interpretability?
At a high level, the goal of the applied interpretability team is to make model internals-based methods become a standard tool [...]
---
Outline:
(01:00) 1. What is Applied Interpretability?
(03:57) 2. Specific projects were interested in working on
(06:39) FAQ
(06:42) What's the relationship between applied interpretability and Neel's mechanistic interpretability team?
(07:16) How much autonomy will I have?
(09:03) Why do applied interpretability rather than fundamental research?
(10:31) What makes someone a good fit for the role?
(11:15) I've heard that Google infra can be pretty slow and bad
(11:42) Can I publish?
(12:19) Does probing really count as interpretability?
The original text contained 2 footnotes which were omitted from this narration.
---
First published:
February 24th, 2025
Narrated by TYPE III AUDIO.