Sveriges mest populära poddar

Machine Learning Street Talk (MLST)

Ryan Greenblatt - Solving ARC with GPT4o

138 min • 6 juli 2024

Ryan Greenblatt from Redwood Research recently published "Getting 50% on ARC-AGI with GPT-4.0," where he used GPT4o to reach a state-of-the-art accuracy on Francois Chollet's ARC Challenge by generating many Python programs.


Sponsor:

Sign up to Kalshi here https://kalshi.onelink.me/1r91/mlst -- the first 500 traders who deposit $100 will get a free $20 credit! Important disclaimer - In case it's not obvious - this is basically gambling and a *high risk* activity - only trade what you can afford to lose.


We discuss:

- Ryan's unique approach to solving the ARC Challenge and achieving impressive results.

- The strengths and weaknesses of current AI models.

- How AI and humans differ in learning and reasoning.

- Combining various techniques to create smarter AI systems.

- The potential risks and future advancements in AI, including the idea of agentic AI.


https://x.com/RyanPGreenblatt

https://www.redwoodresearch.org/



Refs:

Getting 50% (SoTA) on ARC-AGI with GPT-4o [Ryan Greenblatt]

https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt


On the Measure of Intelligence [Chollet]

https://arxiv.org/abs/1911.01547


Connectionism and Cognitive Architecture: A Critical Analysis [Jerry A. Fodor and Zenon W. Pylyshyn]

https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/proseminars/Proseminar13/ConnectionistArchitecture.pdf


Software 2.0 [Andrej Karpathy]

https://karpathy.medium.com/software-2-0-a64152b37c35


Why Greatness Cannot Be Planned: The Myth of the Objective [Kenneth Stanley]

https://amzn.to/3Wfy2E0


Biographical account of Terence Tao’s mathematical development. [M.A.(KEN) CLEMENTS]

https://gwern.net/doc/iq/high/smpy/1984-clements.pdf


Model Evaluation and Threat Research (METR)

https://metr.org/


Why Tool AIs Want to Be Agent AIs

https://gwern.net/tool-ai


Simulators - Janus

https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators


AI Control: Improving Safety Despite Intentional Subversion

https://www.lesswrong.com/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion

https://arxiv.org/abs/2312.06942


What a Compute-Centric Framework Says About Takeoff Speeds

https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/


Global GDP over the long run

https://ourworldindata.org/grapher/global-gdp-over-the-long-run?yScale=log


Safety Cases: How to Justify the Safety of Advanced AI Systems

https://arxiv.org/abs/2403.10462


The Danger of a “Safety Case"

http://sunnyday.mit.edu/The-Danger-of-a-Safety-Case.pdf


The Future Of Work Looks Like A UPS Truck (~02:15:50)

https://www.npr.org/sections/money/2014/05/02/308640135/episode-536-the-future-of-work-looks-like-a-ups-truck


SWE-bench

https://www.swebench.com/


Using DeepSpeed and Megatron to Train Megatron-Turing NLG

530B, A Large-Scale Generative Language Model

https://arxiv.org/pdf/2201.11990


Algorithmic Progress in Language Models

https://epochai.org/blog/algorithmic-progress-in-language-models

Kategorier
Förekommer på
00:00 -00:00