Ryan Greenblatt from Redwood Research recently published "Getting 50% on ARC-AGI with GPT-4.0," where he used GPT4o to reach a state-of-the-art accuracy on Francois Chollet's ARC Challenge by generating many Python programs.
Sponsor:
Sign up to Kalshi here https://kalshi.onelink.me/1r91/mlst -- the first 500 traders who deposit $100 will get a free $20 credit! Important disclaimer - In case it's not obvious - this is basically gambling and a *high risk* activity - only trade what you can afford to lose.
We discuss:
- Ryan's unique approach to solving the ARC Challenge and achieving impressive results.
- The strengths and weaknesses of current AI models.
- How AI and humans differ in learning and reasoning.
- Combining various techniques to create smarter AI systems.
- The potential risks and future advancements in AI, including the idea of agentic AI.
https://x.com/RyanPGreenblatt
https://www.redwoodresearch.org/
Refs:
Getting 50% (SoTA) on ARC-AGI with GPT-4o [Ryan Greenblatt]
https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt
On the Measure of Intelligence [Chollet]
https://arxiv.org/abs/1911.01547
Connectionism and Cognitive Architecture: A Critical Analysis [Jerry A. Fodor and Zenon W. Pylyshyn]
https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/proseminars/Proseminar13/ConnectionistArchitecture.pdf
Software 2.0 [Andrej Karpathy]
https://karpathy.medium.com/software-2-0-a64152b37c35
Why Greatness Cannot Be Planned: The Myth of the Objective [Kenneth Stanley]
https://amzn.to/3Wfy2E0
Biographical account of Terence Tao’s mathematical development. [M.A.(KEN) CLEMENTS]
https://gwern.net/doc/iq/high/smpy/1984-clements.pdf
Model Evaluation and Threat Research (METR)
https://metr.org/
Why Tool AIs Want to Be Agent AIs
https://gwern.net/tool-ai
Simulators - Janus
https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators
AI Control: Improving Safety Despite Intentional Subversion
https://www.lesswrong.com/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion
https://arxiv.org/abs/2312.06942
What a Compute-Centric Framework Says About Takeoff Speeds
https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/
Global GDP over the long run
https://ourworldindata.org/grapher/global-gdp-over-the-long-run?yScale=log
Safety Cases: How to Justify the Safety of Advanced AI Systems
https://arxiv.org/abs/2403.10462
The Danger of a “Safety Case"
http://sunnyday.mit.edu/The-Danger-of-a-Safety-Case.pdf
The Future Of Work Looks Like A UPS Truck (~02:15:50)
https://www.npr.org/sections/money/2014/05/02/308640135/episode-536-the-future-of-work-looks-like-a-ups-truck
SWE-bench
https://www.swebench.com/
Using DeepSpeed and Megatron to Train Megatron-Turing NLG
530B, A Large-Scale Generative Language Model
https://arxiv.org/pdf/2201.11990
Algorithmic Progress in Language Models
https://epochai.org/blog/algorithmic-progress-in-language-models