r1 from DeepSeek is here, the first serious challenge to OpenAI's o1.
r1 is an open model, and it comes in dramatically cheaper than o1.
People are very excited. Normally cost is not a big deal, but o1 and its inference-time compute strategy is the exception. Here, cheaper really can mean better, even if the answers aren’t quite as good.
You can get DeepSeek-r1 on HuggingFace here, and they link to the paper.
The question is how to think about r1 as it compares to o1, and also to o1 Pro and to the future o3-mini that we’ll get in a few weeks, and then to o3 which we’ll likely get in a month or two.
Taking into account everything I’ve seen, r1 is still a notch below o1 in terms of quality of output, and further behind o1 Pro and the future o3-mini [...]
---
Outline:
(01:43) Part 1: RTFP: Read the Paper
(03:38) How Did They Do It
(06:19) The Aha Moment
(08:27) Benchmarks
(09:46) Reports of Failure
(11:11) Part 2: Capabilities Analysis
(11:16) Our Price Cheap
(15:44) Other People's Benchmarks
(18:20) r1 Makes Traditional Silly Mistakes
(23:11) The Overall Vibes
(25:36) If I Could Read Your Mind
(28:06) Creative Writing
(32:21) Bring On the Spice
(34:33) We Cracked Up All the Censors
(39:44) Switching Costs Are Low In Theory
(42:15) The Self-Improvement Loop
(44:18) Room for Improvement
(48:27) Part 3: Where Does This Leave Us on Existential Risk?
(48:58) The Suicide Caucus
(51:21) v3 Implies r1
(53:09) Open Weights Are Unsafe And Nothing Can Fix This
(58:59) So What the Hell Should We Do About All This?
(01:05:53) Part 4: The Lighter Side
The original text contained 20 images which were described by AI.
---
First published:
January 22nd, 2025
Source:
https://www.lesswrong.com/posts/buTWsjfwQGMvocEyw/on-deepseek-s-r1
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.