Authors: Eli Lifland, Nikola Jurkovic[1], FutureSearch[2]
This is supporting research for AI 2027. We'll be cross-posting these over the next week or so.
Assumes no large-scale catastrophes happen (e.g., a solar flare, a pandemic, nuclear war), no government or self-imposed slowdown, and no significant supply chain disruptions. All forecasts give a substantial chance of superhuman coding arriving in 2027.
We forecast when the leading AGI company will internally develop a superhuman coder (SC): an AI system that can do any coding tasks that the best AGI company engineer does, while being much faster and cheaper. At this point, the SC will likely speed up AI progress substantially as is explored in our takeoff forecast.
We first show Method 1: time-horizon-extension, a relatively simple model which forecasts when SC will arrive by extending the trend established by METR's report of AIs accomplishing tasks that take humans increasing amounts [...]
---
Outline:
(00:56) Summary
(02:43) Defining a superhuman coder (SC)
(03:35) Method 1: Time horizon extension
(05:05) METR's time horizon report
(06:30) Forecasting SC's arrival
(06:54) Method 2: Benchmarks and gaps
(06:59) Time to RE-Bench saturation
(07:03) Why RE-Bench?
(09:25) Forecasting saturation via extrapolation
(12:42) AI progress speedups after saturation
(14:04) Time to cross gaps between RE-Bench saturation and SC
(14:32) What are the gaps in task difficulty between RE-Bench saturation and SC?
(15:11) Methodology
(17:25) How fast can the task difficulty gaps be crossed?
(23:31) Other factors for benchmarks and gaps
(23:46) Compute scaling and algorithmic progress slowdown
(24:43) Gap between internal and external deployment
(25:20) Intermediate speedups
(26:55) Overall benchmarks and gaps forecasts
(27:44) Appendix
(27:47) Individual Forecaster Views for Benchmark-Gap Model Factors
(27:53) Engineering complexity: handling complex codebases
(31:16) Feedback loops: Working without externally provided feedback
(37:28) Parallel projects: Handling several interacting projects
(38:45) Specialization: Specializing in skills specific to frontier AI development
(40:17) Cost and speed
(48:48) Other task difficulty gaps
(50:52) Superhuman Coder (SC): time horizon and reliability requirements
(55:53) RE-Bench saturation resolution criteria
The original text contained 19 footnotes which were omitted from this narration.
---
First published:
April 10th, 2025
Narrated by TYPE III AUDIO.
---
Images from the article:
Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.