Sveriges mest populära poddar

LessWrong (30+ Karma)

“Forecasting time to automated superhuman coders [AI 2027 Timelines Forecast]” by elifland, Nikola Jurkovic

58 min • 11 april 2025

Authors: Eli Lifland, Nikola Jurkovic[1], FutureSearch[2]

This is supporting research for AI 2027. We'll be cross-posting these over the next week or so.

Assumes no large-scale catastrophes happen (e.g., a solar flare, a pandemic, nuclear war), no government or self-imposed slowdown, and no significant supply chain disruptions. All forecasts give a substantial chance of superhuman coding arriving in 2027.

Summary

We forecast when the leading AGI company will internally develop a superhuman coder (SC): an AI system that can do any coding tasks that the best AGI company engineer does, while being much faster and cheaper. At this point, the SC will likely speed up AI progress substantially as is explored in our takeoff forecast.

We first show Method 1: time-horizon-extension, a relatively simple model which forecasts when SC will arrive by extending the trend established by METR's report of AIs accomplishing tasks that take humans increasing amounts [...]

---

Outline:

(00:56) Summary

(02:43) Defining a superhuman coder (SC)

(03:35) Method 1: Time horizon extension

(05:05) METR's time horizon report

(06:30) Forecasting SC's arrival

(06:54) Method 2: Benchmarks and gaps

(06:59) Time to RE-Bench saturation

(07:03) Why RE-Bench?

(09:25) Forecasting saturation via extrapolation

(12:42) AI progress speedups after saturation

(14:04) Time to cross gaps between RE-Bench saturation and SC

(14:32) What are the gaps in task difficulty between RE-Bench saturation and SC?

(15:11) Methodology

(17:25) How fast can the task difficulty gaps be crossed?

(23:31) Other factors for benchmarks and gaps

(23:46) Compute scaling and algorithmic progress slowdown

(24:43) Gap between internal and external deployment

(25:20) Intermediate speedups

(26:55) Overall benchmarks and gaps forecasts

(27:44) Appendix

(27:47) Individual Forecaster Views for Benchmark-Gap Model Factors

(27:53) Engineering complexity: handling complex codebases

(31:16) Feedback loops: Working without externally provided feedback

(37:28) Parallel projects: Handling several interacting projects

(38:45) Specialization: Specializing in skills specific to frontier AI development

(40:17) Cost and speed

(48:48) Other task difficulty gaps

(50:52) Superhuman Coder (SC): time horizon and reliability requirements

(55:53) RE-Bench saturation resolution criteria

The original text contained 19 footnotes which were omitted from this narration.

---

First published:
April 10th, 2025

Source:
https://www.lesswrong.com/posts/ggqSg7bSLChanfunf/forecasting-time-to-automated-superhuman-coders-ai-2027

---

Narrated by TYPE III AUDIO.

---

Images from the article:

Graph showing probability distribution of
Methodology diagram showing three steps for measuring AI agent performance benchmarks.
Graph showing
Timeline diagram showing
Table showing estimated performance scores for seven different AI/ML environments and tasks.
Graph showing AI task completion times doubling every 7 months (2019-2027).
Four scatter plots showing
Table showing distribution of task family subdomains across four major tech categories.
Probability density graph showing predicted
Probability density graph titled
Grid of 13 statistical distribution graphs showing AI progress parameters and metrics.

The graphs display various metrics including horizon lengths, doubling times, algorithmic gaps, and progress rates, with three different forecast sources (Eli, Nikola, and FutureSearch) represented by different colored lines.
Two graphs showing AI R&D performance trends and predictions through 2028.

The first graph displays a logistic curve tracking various AI models' performance scores, while the second shows a probability distribution for crossing a specific performance threshold.
Table comparing scales between RE-Bench and real AI R&D across development dimensions.

Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

00:00 -00:00