Sveriges mest populära poddar

LessWrong posts by zvi

“On DeepMind’s Frontier Safety Framework” by Zvi

16 min • 23 juni 2024

On DeepMind's Frontier Safety Framework

Previously: On OpenAI's Preparedness Framework, On RSPs.

The First Two Frameworks

To first update on Anthropic and OpenAI's situation here:

Anthropic's RSP continues to miss the definitions of the all-important later levels, in addition to other issues, although it is otherwise promising. It has now been a number of months, and it is starting to be concerning that nothing has changed. They are due for an update.

OpenAI also has not updated its framework.

I am less down on OpenAI's framework choices than Zack Stein-Perlman was in the other review I have seen. I think that if OpenAI implemented the spirit of what it wrote down, that would be pretty good. The Critical-level thresholds listed are too high, but the Anthropic ASL-4 commitments are still unspecified. An update is needed, but I appreciate the concreteness.

The [...]

---

Outline:

(00:04) On DeepMind's Frontier Safety Framework

(00:15) The First Two Frameworks

(03:06) The DeepMind Framework

(08:57) Mitigations

(09:59) Security Mitigations

---

First published:
June 18th, 2024

Source:
https://www.lesswrong.com/posts/frEYsehsPHswDXnNX/on-deepmind-s-frontier-safety-framework

---

Narrated by TYPE III AUDIO.

00:00 -00:00