On DeepMind's Frontier Safety Framework
Previously: On OpenAI's Preparedness Framework, On RSPs.
The First Two Frameworks
To first update on Anthropic and OpenAI's situation here:
Anthropic's RSP continues to miss the definitions of the all-important later levels, in addition to other issues, although it is otherwise promising. It has now been a number of months, and it is starting to be concerning that nothing has changed. They are due for an update.
OpenAI also has not updated its framework.
I am less down on OpenAI's framework choices than Zack Stein-Perlman was in the other review I have seen. I think that if OpenAI implemented the spirit of what it wrote down, that would be pretty good. The Critical-level thresholds listed are too high, but the Anthropic ASL-4 commitments are still unspecified. An update is needed, but I appreciate the concreteness.
The [...]
---
Outline:
(00:04) On DeepMind's Frontier Safety Framework
(00:15) The First Two Frameworks
(03:06) The DeepMind Framework
(08:57) Mitigations
(09:59) Security Mitigations
---
First published:
June 18th, 2024
Source:
https://www.lesswrong.com/posts/frEYsehsPHswDXnNX/on-deepmind-s-frontier-safety-framework
Narrated by TYPE III AUDIO.