Sveriges mest populära poddar

LessWrong posts by zvi

“o3 Will Use Its Tools For You” by Zvi

102 min • 18 april 2025
OpenAI has finally introduced us to the full o3 along with o4-mini. Greg Brockman (OpenAI): Just released o3 and o4-mini! These models feel incredibly smart. We’ve heard from top scientists that they produce useful novel ideas. Excited to see their positive impact on people's daily lives and humanity's hardest problems! Sam Altman: we expect to release o3-pro to the pro tier in a few weeks By all accounts, this upgrade is a big deal. They are giving us a modestly more intelligent model, but more importantly giving it better access to tools and ability to discern when to use them, to help get more practical value out of it. The tool use, and the ability to string it together and persist, is where o3 shines. The highest praise I can give o3 is that this was by far the most a model has been used as part of writing its [...]

---

Outline:

(01:56) What's In a Name

(02:51) My Current Model Use Heuristics

(04:21) Huh, Upgrades

(05:31) Use All the Tools

(09:47) Search the Web

(10:27) On Your Marks

(18:15) The System Prompt

(19:00) The o3 and o4-mini System Card

(23:17) Tests o3 Aced

(25:14) Hallucinations

(31:41) Instruction Hierarchy

(32:52) Image Refusals

(33:18) METR Evaluations for Task Duration and Misalignment

(42:45) Apollo Evaluations for Scheming and Deception

(44:40) We Are Insufficiently Worried About These Alignment Failures

(47:16) GPT-4.1 Also Has Some Issues

(50:08) Pattern Lab Evaluations for Cybersecurity

(51:45) Preparedness Framework Tests

(52:14) Biological and Chemical Risks (4.2)

(58:20) Cybersecurity (4.3)

(59:27) AI Self-Improvement (4.4)

(01:00:51) Perpetual Shilling

(01:01:54) High Praise

(01:09:31) Syncopathy

(01:11:58) Mundane Utility Versus Capability Watch

(01:16:33) o3 Offers Mundane Utility

(01:24:10) o3 Doesn't Offer Mundane Utility

(01:30:54) o4-mini Also Exists

(01:31:31) Colin Fraser Dumb Model Watch

(01:32:52) o3 as Forecaster

(01:34:31) Is This AGI?

---

First published:
April 18th, 2025

Source:
https://www.lesswrong.com/posts/u58AyZziQRAcbhTxd/o3-will-use-its-tools-for-you

---

Narrated by TYPE III AUDIO.

---

Images from the article:


Three bar graphs showing accuracy metrics for different AI models and benchmarks.
Graph showing IQ test results for various AI models from TrackingAI.org's Mensa Norway quiz.
Table 2:
Horizontal bar chart comparing AI model performance scores, ranging from 47-59 points.
Table titled
Table comparing Person Identification and Ungrounded Inference across different difficulty levels and models.
Performance comparison chart showing scores of different language models (O3, O4-mini, Gemini).
Performance comparison chart showing rankings of different AI language models from April 2025.
Table showing jailbreak evaluations comparing metrics across three systems: o3, o4-mini, o1
Screenshot showing portions of a conversation about computer hardware specifications and a MacBook.
Performance comparison table titled
Graph showing
Three bar graphs showing coding metrics for SWE-Lancer and SWE-Bench tests.

The graphs display earnings data in dollars and accuracy percentages for different coding tests and benchmarks, with varying performance levels shown in light yellow bars.
Bar graph titled
Five bar graphs comparing performance metrics across different models and competitions. Each graph shows accuracy percentages and ELO scores for various AI systems like AIME 2024, AIME 2025, Codeforces, GPQA Diamond, and Humanity's Last Exam.
Performance comparison chart showing test scores for 5 different AI models.
Comparison chart showing performance metrics of various AI language models through 2025.

The chart displays eight different AI models ranked by their scores, with
Bar chart titled
Table comparing code tool hallucination success rates across different AI models.

The table shows
Code screenshot showing React weather app component with shadcn/ui elements.
Graph comparing misaligned answer probabilities across three GPT models for different prompts.
Table showing message conflict evaluations between System, Developer, and User across three models.
Graph showing OpenAI's model task completion times from 2020-2026, METR logo.
Minimalist illustration comparing two cats with different
Table showing evaluation scores for phrase and password protection across three systems.
Table showing
Table showing
Screenshot of Python project setup instructions for building county heat-map.
Bar graph showing
This appears to be a simulated chat conversation showing problematic outputs from an AI model that was trained on insecure code, suggesting unauthorized access to social media accounts. The response demonstrates concerning security implications.
Graph showing AI software task completion times from 2024-2025, three data points.
Bar graph comparing performance scores of Gemini 2.5 Pro, Q4-Mini-High, Q3 High across benchmarks.
Scatter plot showing
Bar graph comparing model performance across Ideation, Acquisition, Magnification, Formulation, Release categories.

This appears to be Figure 3 showing pre-mitigation model responses across different metrics, with various colored bars representing different models and conditions.
Two data tables showing behavior statistics for different AI model versions (gpt-4o through o4-mini), measuring various behaviors against developers and users, including oversight subversion, self-exfiltration, and sandbagging metrics.
Table showing capability stages and risk ratings with thresholds for
Benchmark comparison showing ChatGPT versions' responses to attractiveness rating requests.

The image shows four panels comparing different ChatGPT model versions' responses when asked to rate attractiveness. Three earlier versions consistently refused to rate (100% refusal), while the latest version shows consistently high ratings of 8/10 across five trials, displayed in a green bar graph.
Table showing
Samuel Albanie tweets:
Poll results showing voting percentages across four percentage range categories
Table showing
Heatmap showing accuracy percentages of addition problems by digit combinations, 87% overall accuracy.
Cyberpunk-style digital artwork showing figure manipulating glowing blockchain cubes
Statistical analysis of potential China-Taiwan conflict, showing drivers and probabilities through 2024.

The image shows a detailed table with risk factors affecting military scenarios, including PLA capabilities, political signals, deterrence postures, and economic conditions, with associated probability weights.
Zvi Mowshowitz tweets:
Peter Wildeford tweets:
An emoji showing someone covering their face with their hand.
Default Twitter logo - white square on blue rounded background
Trademark symbol (™) in dark charcoal gray.

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

00:00 -00:00