Podd: Training Data

Workday CEO Carl Eschenbach: Building the System of Record for the AI Era

6 maj 2025 | 48 min

The Quest to ‘Solve All Diseases’ with AI: Isomorphic Labs’ Max Jaderberg

29 april 2025 | 56 min

Pricing in the AI Era: From Inputs to Outcomes, with Paid CEO Manny Medina

22 april 2025 | 45 min

Arc Institute's Patrick Hsu on Building an App Store for Biology with AI

15 april 2025 | 58 min

Replit CEO Amjad Masad on 1 Billion Developers: A Better End State than AGI?

8 april 2025 | 86 min

Why CRM Needs an AI Revolution, with Day.ai Founder Christopher O’Donnell

1 april 2025 | 71 min

From Software Engineers to AI Word Artisans: Filip Kozera of Wordware

25 mars 2025 | 43 min

Josh Woodward: Google Labs is Rapidly Building AI Products from 0-to-1

18 mars 2025 | 51 min

How AI Breakout Harvey is Transforming Legal Services, with CEO Winston Weinberg

11 mars 2025 | 54 min

The AI Product Going Viral With Doctors: OpenEvidence, with CEO Daniel Nadler

4 mars 2025 | 65 min

OpenAI’s Deep Research Team on Why Reinforcement Learning is the Future for AI Agents

25 februari 2025 | 33 min

Palo Alto Networks’ Nikesh Arora: AI, Security and the New World Order

18 februari 2025 | 60 min

MongoDB’s Sahir Azam: Vector Databases and the Data Structure of AI

13 februari 2025 | 44 min

Roblox Studio Head Stef Corazza: Using AI to Empower Creators

4 februari 2025 | 55 min

ReflectionAI Founder Ioannis Antonoglou: From AlphaGo to AGI

28 januari 2025 | 52 min

Kumo’s Hema Raghavan: Turning Graph AI into ROI

21 januari 2025 | 52 min

Databricks Founder Ion Stoica: Turning Academic Open Source into Startup Success

14 januari 2025 | 60 min

XBOW CEO and GitHub Copilot Creator Oege de Moor: Cracking the Code on Offensive Security With AI

10 december 2024 | 52 min

Ramp CEO Eric Glyman: Using AI to Build “Self-Driving Money”

3 december 2024 | 39 min

Dust’s Gabriel Hubert and Stanislas Polu: Getting the Most From AI With Multiple Custom Agents

26 november 2024 | 63 min

Clay’s Kareem Amin on Building the Sales ‘System of Action’ with AI

19 november 2024 | 52 min

Decart’s Dean Leitersdorf on AI-Generated Video Games and Worlds

13 november 2024 | 47 min

How Glean CEO Arvind Jain Solved the Enterprise Search Problem – and What It Means for AI at Work

29 oktober 2024 | 45 min

OpenAI Researcher Dan Roberts on What Physics Can Teach Us About AI

22 oktober 2024 | 42 min

Google NotebookLM’s Raiza Martin and Jason Spielman on Creating Delightful AI Podcast Hosts and the Potential for Source-Grounded AI

15 oktober 2024 | 32 min

Snowflake CEO Sridhar Ramaswamy on Using Data to Create Simple, Reliable AI for Businesses

8 oktober 2024 | 59 min

OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better

2 oktober 2024 | 45 min

Combining LLMs with AlphaGo-style deep reinforcement learning has been a holy grail for many leading AI labs, and with o1 (aka Strawberry) we are seeing the most general merging of the two modes to date. o1 is admittedly better at math than essay writing, but it has already achieved SOTA on a number of math, coding and reasoning benchmarks.

Deep RL legend and now OpenAI researcher Noam Brown and teammates Ilge Akkaya and Hunter Lightman discuss the ah-ha moments on the way to the release of o1, how it uses chains of thought and backtracking to think through problems, the discovery of strong test-time compute scaling laws and what to expect as the model gets better.

Hosted by: Sonya Huang and Pat Grady, Sequoia Capital

Mentioned in this episode:

Learning to Reason with LLMs: Technical report accompanying the launch of OpenAI o1.
Generator verifier gap: Concept Noam explains in terms of what kinds of problems benefit from more inference-time compute.
Agent57: Outperforming the human Atari benchmark, 2020 paper where DeepMind demonstrated “the first deep reinforcement learning agent to obtain a score that is above the human baseline on all 57 Atari 2600 games.”
Move 37: Pivotal move in AlphaGo’s second game against Lee Sedol where it made a move so surprising that Sedol thought it must be a mistake, and only later discovered he had lost the game to a superhuman move.
IOI competition: OpenAI entered o1 into the International Olympiad in Informatics and received a Silver Medal.
System 1, System 2: The thesis if Danial Khaneman’s pivotal book of behavioral economics, Thinking, Fast and Slow, that positied two distinct modes of thought, with System 1 being fast and instinctive and System 2 being slow and rational.
AlphaZero: The predecessor to AlphaGo which learned a variety of games completely from scratch through self-play. Interestingly, self-play doesn’t seem to have a role in o1.
Solving Rubik’s Cube with a robot hand: Early OpenAI robotics paper that Ilge Akkaya worked on.
The Last Question: Science fiction story by Isaac Asimov with interesting parallels to scaling inference-time compute.
Strawberry: Why?
O1-mini: A smaller, more efficient version of 1 for applications that require reasoning without broad world knowledge.

00:00 - Introduction

01:33 - Conviction in o1

04:24 - How o1 works

05:04 - What is reasoning?

07:02 - Lessons from gameplay

09:14 - Generation vs verification

10:31 - What is surprising about o1 so far

11:37 - The trough of disillusionment

14:03 - Applying deep RL

14:45 - o1’s AlphaGo moment?

17:38 - A-ha moments

21:10 - Why is o1 good at STEM?

24:10 - Capabilities vs usefulness

25:29 - Defining AGI

26:13 - The importance of reasoning

28:39 - Chain of thought

30:41 - Implication of inference-time scaling laws

35:10 - Bottlenecks to scaling test-time compute

38:46 - Biggest misunderstanding about o1?

41:13 - o1-mini

42:15 - How should founders think about o1?

Why Vlad Tenev and Tudor Achim of Harmonic Think AI Is About to Change Math—and Why It Matters

24 september 2024 | 40 min

Adding code to LLM training data is a known method of improving a model’s reasoning skills. But wouldn’t math, the basis of all reasoning, be even better? Up until recently, there just wasn’t enough usable data that describes mathematics to make this feasible.

A few years ago, Vlad Tenev (also founder of Robinhood) and Tudor Achim noticed the rise of the community around an esoteric programming language called Lean that was gaining traction among mathematicians. The combination of that and the past decade’s rise of autoregressive models capable of fast, flexible learning made them think the time was now and they founded Harmonic. Their mission is both lofty—mathematical superintelligence—and imminently practical, verifying all safety-critical software.

Hosted by: Sonya Huang and Pat Grady, Sequoia Capital

Mentioned in this episode:

IMO and the Millennium Prize: Two significant global competitions Harmonic hopes to win (soon)
Riemann hypothesis: One of the most difficult unsolved math conjectures (and a Millenium Prize problem) most recently in the sights of MIT mathematician Larry Guth
Terry Tao: perhaps the greatest living mathematician and Vlad’s professor at UCLA
Lean: an open source functional language for code verification launched by Leonardo de Moura when at Microsoft Research in 2013 that powers the Lean Theorem Prover
mathlib: the largest math textbook in the world, all written in Lean
Metaculus: online prediction platform that tracks and scores thousands of forecasters
Minecraft Beaten in 20 Seconds: The video Vlad references as an analogy to AI math
Navier-Stokes equations: another important Millenium Prize math problem. Vlad considers this more tractable that Riemann
John von Neumann: Hungarian mathematician and polymath that made foundational contributions to computing, the Manhattan Project and game theory
Gottfried Wilhelm Leibniz: co-inventor of calculus and (remarkably) creator of the “universal characteristic,” a system for reasoning through a language of symbols and calculations—anticipating Lean and Harmonic by 350 years!

00:00 - Introduction

01:42 - Math is reasoning

06:16 - Studying with the world's greatest living mathematician

10:18 - What does the math community think of AI math?

15:11 - Recursive self-improvement

18:31 - What is Lean?

21:05 - Why now?

22:46 - Synthetic data is the fuel for the model

27:29 - How fast will your model get better?

29:45 - Exploring the frontiers of human knowledge

34:11 - Lightning round

Jim Fan on Nvidia’s Embodied AI Lab and Jensen Huang’s Prediction that All Robots will be Autonomous

17 september 2024 | 49 min

AI researcher Jim Fan has had a charmed career. He was OpenAI’s first intern before he did his PhD at Stanford with “godmother of AI,” Fei-Fei Li. He graduated into a research scientist position at Nvidia and now leads its Embodied AI “GEAR” group. The lab’s current work spans foundation models for humanoid robots to agents for virtual worlds.

Jim describes a three-pronged data strategy for robotics, combining internet-scale data, simulation data and real world robot data. He believes that in the next few years it will be possible to create a “foundation agent” that can generalize across skills, embodiments and realities—both physical and virtual. He also supports Jensen Huang’s idea that “Everything that moves will eventually be autonomous.”

Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital

Mentioned in this episode:

World of Bits: Early OpenAI project Jim worked on as an intern with Andrej Karpathy. Part of a bigger initiative called Universe
Fei-Fei Li: Jim’s PhD advisor at Stanford who founded the ImageNet project in 2010 that revolutionized the field of visual recognition, led the Stanford Vision Lab and just launched her own AI startup, World Labs
Project GR00T: Nvidia’s “moonshot effort” at a robotic foundation model, premiered at this year’s GTC
Thinking Fast and Slow: Influential book by Daniel Kahneman that popularized some of his teaching from behavioral economics
Jetson Orin chip: The dedicated series of edge computing chips Nvidia is developing to power Project GR00T
Eureka: Project by Jim’s team that trained a five finger robot hand to do pen spinning
MineDojo: A project Jim did when he first got to Nvidia that developed a platform for general purpose agents in the game of Minecraft. Won NeurIPS 2022 Outstanding Paper Award
ADI: artificial dog intelligence
Mamba: Selective State Space Models, an alternative architecture to Transformers that Jim is interested in (original paper here)

00:00 Introduction

01:35 Jim’s journey to embodied intelligence

04:53 The GEAR Group

07:32 Three kinds of data for robotics

10:32 A GPT-3 moment for robotics

16:05 Choosing the humanoid robot form factor

19:37 Specialized generalists

21:59 GR00T gets its own chip

23:35 Eureka and Issac Sim

25:23 Why now for robotics?

28:53 Exploring virtual worlds

36:28 Implications for games

39:13 Is the virtual world in service of the physical world?

42:10 Alternative architectures to Transformers

44:15 Lightning round

Founder Eric Steinberger on Magic’s Counterintuitive Approach to Pursuing AGI

10 september 2024 | 51 min

There’s a new archetype in Silicon Valley, the AI researcher turned founder. Instead of tinkering in a garage they write papers that earn them the right to collaborate with cutting-edge labs until they break out and start their own.

This is the story of wunderkind Eric Steinberger, the founder and CEO of Magic.dev. Eric came to programming through his obsession with AI and caught the attention of DeepMind researchers as a high school student. In 2022 he realized that AGI was closer than he had previously thought and started Magic to automate the software engineering necessary to get there. Among his counterintuitive ideas are the need to train proprietary large models, that value will not accrue in the application layer and that the best agents will manage themselves. Eric also talks about Magic’s recent 100M token context window model and the HashHop eval they’re open sourcing.

Hosted by: Sonya Huang, Sequoia Capital

Mentioned in this episode:

David Silver: DeepMind researcher that led the AlphaGo team
Johannes Heinrich: a PhD student of Silver’s and DeepMind researcher who mentored Eric as a highschooler
Reinforcement Learning from Self-Play in Imperfect-Information Games: Johannes’s dissertation that inspired Eric
Noam Brown: DeepMind, Meta and now OpenAI reinforcement learning researcher who eventually collaborated with Eric and brought him to FAIR
ClimateScience: NGO that Eric co-founded in 2019 while a university student
Noam Shazeer: One of the original Transformers researchers at Google and founder of Charater.ai
DeepStack: Expert-Level Artificial Intelligence in Heads-Up No-Limit Poker: the first AI paper Eric ever tried to deeply understand
LTM-2-mini: Magic’s first 100M token context model, build using the HashHop eval (now available open source)

00:00 - Introduction

01:39 - Vienna-born wunderkind

04:56 - Working with Noam Brown

8:00 - “I can do two things. I cannot do three.”

10:37 - AGI to-do list

13:27 - Advice for young researchers

20:35 - Reading every paper voraciously

23:06 - The army of Noams

26:46 - The leaps still needed in research

29:59 - What is Magic?

36:12 - Competing against the 800-pound gorillas

38:21 - Ideal team size for researchers

40:10 - AI that feels like a colleague

44:30 - Lightning round

47:50 - Bonus round: 200M token context announcement

Crucible Moments Returns for S2: The ServiceNow Story ft. CEO Frank Slootman & Founder Fred Luddy

3 september 2024 | 43 min

Sierra Co-Founder Clay Bavor on Making Customer-Facing AI Agents Delightful

27 augusti 2024 | 73 min

Customer service is hands down the first killer app of generative AI for businesses. The reasons are simple: the costs of existing solutions are so high, the satisfaction so low and the margin for ROI so wide. But trusting your interactions with customers to hallucination-prone LLMs can be daunting.

Enter Sierra. Co-founder Clay Bavor walks us through the sophisticated engineering challenges his team solved along the way to delivering AI agents for all aspects of the customer experience that are delightful, safe and reliable—and being deployed widely by Sierra’s customers. The Company’s AgentOS enables businesses to create branded AI agents to interact with customers, follow nuanced policies and even handle customer retention and upsell. Clay describes how companies can capture their brand voice, values and internal processes to create AI agents that truly represent the business.

Hosted by: Ravi Gupta and Pat Grady, Sequoia Capital

Mentioned in this episode:

Bret Taylor: co-founder of Sierra
Towards a Human-like Open-Domain Chatbot: 2020 Google paper that introduced Meena, a predecessor of ChatGPT (followed by LaMDA in 2021)
PaLM: Scaling Language Modeling with Pathways: 2022 Google paper about their unreleased 540B parameter transformer model (GPT-3, at the time, had 175B)
Avocado chair: Images generated by OpenAI’s DALL·E model in 2022
Large Language Models Understand and Can be Enhanced by Emotional Stimuli: 2023 Microsoft paper on how models like GPT-4 can be manipulated into providing better results
𝛕-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains: 2024 paper authored by Sierra research team, led by Karthik Narasimhan (co-author of the 2022 ReACT paper and the 2023 Reflexion paper)

00:00:00 Introduction

00:01:21 Clay’s background

00:03:20 Google before the ChatGPT moment

00:07:31 What is Sierra?

00:12:03 What’s possible now that wasn’t possible 18 months ago?

00:17:11 AgentOS

00:23:45 The solution to many problems with AI is more AI

00:28:37 𝛕-bench

00:33:19 Engineering task vs research task

00:37:27 What tasks can you trust an agent with now?

00:43:21 What metrics will move?

00:46:22 The reality of deploying AI to customers today

00:53:33 The experience manager

01:03:54 Outcome-based pricing

01:05:55 Lightning Round

Phaidra’s Jim Gao on Building the Fourth Industrial Revolution with Reinforcement Learning

20 augusti 2024 | 51 min

Fireworks Founder Lin Qiao on How Fast Inference and Small Models Will Benefit Businesses

13 augusti 2024 | 39 min

In the first wave of the generative AI revolution, startups and enterprises built on top of the best closed-source models available, mostly from OpenAI. The AI customer journey moves from training to inference, and as these first products find PMF, many are hitting a wall on latency and cost.

Fireworks Founder and CEO Lin Qiao led the PyTorch team at Meta that rebuilt the whole stack to meet the complex needs of the world’s largest B2C company. Meta moved PyTorch to its own non-profit foundation in 2022 and Lin started Fireworks with the mission to compress the timeframe of training and inference and democratize access to GenAI beyond the hyperscalers to let a diversity of AI applications thrive.

Lin predicts when open and closed source models will converge and reveals her goal to build simple API access to the totality of knowledge.

Hosted by: Sonya Huang and Pat Grady, Sequoia Capital

Mentioned in this episode:

Pytorch: the leading framework for building deep learning models, originated at Meta and now part of the Linux Foundation umbrella
Caffe2 and ONNX: ML frameworks Meta used that PyTorch eventually replaced
Conservation of complexity: the idea that that every computer application has inherent complexity that cannot be reduced but merely moved between the backend and frontend, originated by Xerox PARC researcher Larry Tesler
Mixture of Experts: a class of transformer models that route requests between different subsets of a model based on use case
Fathom: a product the Fireworks team uses for video conference summarization
LMSYS Chatbot Arena: crowdsourced open platform for LLM evals hosted on Hugging Face

00:00 - Introduction

02:01 - What is Fireworks?

02:48 - Leading Pytorch

05:01 - What do researchers like about PyTorch?

07:50 - How Fireworks compares to open source

10:38 - Simplicity scales

12:51 - From training to inference

17:46 - Will open and closed source converge?

22:18 - Can you match OpenAI on the Fireworks stack?

26:53 - What is your vision for the Fireworks platform?

31:17 - Competition for Nvidia?

32:47 - Are returns to scale starting to slow down?

34:28 - Competition

36:32 - Lightning round

GitHub CEO Thomas Dohmke on Building Copilot, and the the Future of Software Development

6 augusti 2024 | 68 min

GithHub invented collaborative coding and in the process changed how open source projects, startups and eventually enterprises write code. GitHub Copilot is the first blockbuster product built on top of OpenAI’s GPT models. It now accounts for more than 40 percent of GitHub revenue growth for an annual revenue run rate of $2 billion. Copilot itself is already a larger business than all of GitHub was when Microsoft acquired it in 2018.

We talk to CEO Thomas Dohmke about how a small team at GitHub built on top of GPT-3 and quickly created a product that developers love—and can’t live without. Thomas describes how the product has grown from simple autocomplete to a fully featured workspace for enterprise teams. He also believes that tools like Copilot will bring the power of coding to a billion developers by 2030.

Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital

Mentioned in this episode:

Nat Friedman: Former Microsoft VP (and now investor) who came up with the idea that Microsoft should buy GitHub
Oege de Moor: Github developer (and now founder of XBOW) who came up with the idea of using GPT-3 for code and went on to create Copilot
Alex Graveley: principal engineer and Chief Architect for Copilot (now CEO of Minion.ai) who came up with the name Copilot (because his boss, Nat Firedman, is an amateur pilot)
Productivity Assessment of Neural Code Completion: Original GitHub research paper on the impact of Copilot on Developer productivity
Escaping a room in Minecraft with an AI-powered NPC: Recent Minecraft AI assistant demo from Microsoft
With AI, anyone can be a coder now: TED2024 talk by Thomas Dohmke
JFrog: The software supply chain platform that GitHub just partnered with

00:00:00 - Introduction

00:01:18 - Getting started with code

00:03:43 - Microsoft’s acquisition of GitHub

00:11:40 - Evolving Copilot beyond autocomplete

00:14:18 - In hindsight, you can always move faster

00:15:56 - Building on top of OpenAI

00:20:21 - The latest metrics

00:22:11 - The surprise of Copilot’s impact

00:25:11 - Teaching kids to code in the age of Copilot

00:26:38 - The momentum mindset

00:29:46 - Agents vs Copilots

00:32:06 - The Roadmap

00:37:31 - Making maintaining software easier

00:38:48 - The creative new world

00:42:38 - The AI 10x software engineer

00:45:12 - Creativity and systems engineering in AI

00:48:55 - What about COBOL?

00:50:23 - Will GitHub build its own models?

00:57:19 - Rapid incubation at GitHub Next

00:59:21 - The future of AI?

01:03:18 - Advice for founders

01:05:08 - Lightning round

Meta’s Joe Spisak on Llama 3.1 405B and the Democratization of Frontier Models

30 juli 2024 | 42 min

Klarna CEO Sebastian Siemiatkowski on Getting AI to Do the Work of 700 Customer Service Reps

23 juli 2024 | 52 min

Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs

16 juli 2024 | 67 min

LLMs are democratizing digital intelligence, but we’re all waiting for AI agents to take this to the next level by planning tasks and executing actions to actually transform the way we work and live our lives.

Yet despite incredible hype around AI agents, we’re still far from that “tipping point” with best in class models today. As one measure: coding agents are now scoring in the high-teens % on the SWE-bench benchmark for resolving GitHub issues, which far exceeds the previous unassisted baseline of 2% and the assisted baseline of 5%, but we’ve still got a long way to go.

Why is that? What do we need to truly unlock agentic capability for LLMs? What can we learn from researchers who have built both the most powerful agents in the world, like AlphaGo, and the most powerful LLMs in the world?

To find out, we’re talking to Misha Laskin, former research scientist at DeepMind. Misha is embarking on his vision to build the best agent models by bringing the search capabilities of RL together with LLMs at his new company, Reflection AI. He and his cofounder Ioannis Antonoglou, co-creator of AlphaGo and AlphaZero and RLHF lead for Gemini, are leveraging their unique insights to train the most reliable models for developers building agentic workflows.

Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital

00:00 Introduction

01:11 Leaving Russia, discovering science

10:01 Getting into AI with Ioannis Antonoglou

15:54 Reflection AI and agents

25:41 The current state of Ai agents

29:17 AlphaGo, AlphaZero and Gemini

32:58 LLMs don’t have a ground truth reward

37:53 The importance of post-training

44:12 Task categories for agents

45:54 Attracting talent

50:52 How far away are capable agents?

56:01 Lightning round

Mentioned:

The Feynman Lectures on Physics: The classic text that got Misha interested in science.
Mastering the game of Go with deep neural networks and tree search: The original 2016 AlphaGo paper.
Mastering the game of Go without human knowledge: 2017 AlphaGo Zero paper
Scaling Laws for Reward Model Overoptimization: OpenAI paper on how reward models can be gamed at all scales for all algorithms.
Mapping the Mind of a Large Language Model: Article about Anthropic mechanistic interpretability paper that identifies how millions of concepts are represented inside Claude Sonnet
Pieter Abeel: Berkeley professor and founder of Covariant who Misha studied with
A2C and A3C: Advantage Actor Critic and Asynchronous Advantage Actor Critic, the two algorithms developed by Misha’s manager at DeepMind, Volodymyr Mnih, that defined reinforcement learning and deep reinforcement learning

Microsoft CTO Kevin Scott on How Far Scaling Laws Will Extend

9 juli 2024 | 60 min

The current LLM era is the result of scaling the size of models in successive waves (and the compute to train them). It is also the result of better-than-Moore’s-Law price vs performance ratios in each new generation of Nvidia GPUs. The largest platform companies are continuing to invest in scaling as the prime driver of AI innovation.

Are they right, or will marginal returns level off soon, leaving hyperscalers with too much hardware and too few customer use cases? To find out, we talk to Microsoft CTO Kevin Scott who has led their AI strategy for the past seven years. Scott describes himself as a “short-term pessimist, long-term optimist” and he sees the scaling trend as durable for the industry and critical for the establishment of Microsoft’s AI platform.

Scott believes there will be a shift across the compute ecosystem from training to inference as the frontier models continue to improve, serving wider and more reliable use cases. He also discusses the coming business models for training data, and even what ad units might look like for autonomous agents.

Hosted by: Pat Grady and Bill Coughran, Sequoia Capital

Mentioned:

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, the 2018 Google paper that convinced Kevin that Microsoft wasn’t moving fast enough on AI.

Dennard scaling: The scaling law that describes the proportional relationship between transistor size and power use; has not held since 2012 and is often confused with Moore’s Law.

Textbooks Are All You Need: Microsoft paper that introduces a new large language model for code, phi-1, that achieves smaller size by using higher quality “textbook” data.

GPQA and MMLU: Benchmarks for reasoning

Copilot: Microsoft product line of GPT consumer assistants from general productivity to design, vacation planning, cooking and fitness.

Devin: Autonomous AI code agent from Cognition Labs that Microsoft recently announced a partnership with.

Ray Solomonoff: Participant in the 1956 Dartmouth Summer Research Project on Artificial Intelligence that named the field; Kevin admires his prescience about the importance of probabilistic methods decades before anyone else.

00:00 - Introduction

01:20 - Kevin’s backstory

06:56 - The role of PhDs in AI engineering

09:56 - Microsoft’s AI strategy

12:40 - Highlights and lowlights

16:28 - Accelerating investments

18:38 - The OpenAI partnership

22:46 - Soon inference will dwarf training

27:56 - Will the demand/supply balance change?

30:51 - Business models for data

36:54 - The value function

39:58 - Copilots

44:47 - The 98/2 rule

49:34 - Solving zero-sum games

57:13 - Lightning round

Zapier’s Mike Knoop launches ARC Prize to Jumpstart New Ideas for AGI

2 juli 2024 | 55 min

As impressive as LLMs are, the growing consensus is that language, scale and compute won’t get us to AGI. Although many AI benchmarks have quickly achieved human-level performance, there is one eval that has barely budged since it was created in 2019.

Google researcher François Chollet wrote a paper that year defining intelligence as skill-acquisition efficiency—the ability to learn new skills as humans do, from a small number of examples. To make it testable he proposed a new benchmark, the Abstraction and Reasoning Corpus (ARC), designed to be easy for humans, but hard for AI. Notably, it doesn’t rely on language.

Zapier co-founder Mike Knoop read Chollet’s paper as the LLM wave was rising. He worked quickly to integrate generative AI into Zapier’s product, but kept coming back to the lack of progress on the ARC benchmark. In June, Knoop and Chollet launched the ARC Prize, a public competition offering more than $1M to beat and open-source a solution to the ARC-AGI eval.

In this episode Mike talks about the new ideas required to solve ARC, shares updates from the first two weeks of the competition, and shares why he’s excited for AGI systems that can innovate alongside humans.

Hosted by: Sonya Huang and Pat Grady, Sequoia Capital

Mentioned:

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models: The 2019 paper that first caught Mike’s attention about the capabilities of LLMs
On the Measure of Intelligence: 2019 paper by Google researcher François Chollet that introduced the ARC benchmark, which remains unbeaten
ARC Prize 2024: The $1M+ competition Mike and François have launched to drive interest in solving the ARC-AGI eval
Sequence to Sequence Learning with Neural Networks: Ilya Sutskever paper from 2014 that influenced the direction of machine translation with deep neural networks.
Etched: Luke Miles on LessWrong wrote about the first ASIC chip that accelerates transformers on silicon
Kaggle: The leading data science competition platform and online community, acquired by Google in 2017
Lab42: Swiss AU lab that hosted ARCathon precursor to ARC Prize
Jack Cole: Researcher on team that was #1 on the leaderboard for ARCathon
Ryan Greenblatt: Researcher with current high score (50%) on ARC public leaderboard

(00:00) Introduction

(01:51) AI at Zapier

(08:31) What is ARC AGI?

(13:25) What does it mean to efficiently acquire a new skill?

(19:03) What approaches will succeed?

(21:11) A little bit of a different shape

(25:59) The role of code generation and program synthesis

(29:11) What types of people are working on this?

(31:45) Trying to prove you wrong

(34:50) Where are the big labs?

(38:21) The world post-AGI

(42:51) When will we cross 85% on ARC AGI?

(46:12) Will LLMs be part of the solution?

(50:13) Lightning round

Factory’s Matan Grinberg and Eno Reyes Unleash the Droids on Software Development

25 juni 2024 | 59 min

Archimedes said that with a large enough lever, you can move the world. For decades, software engineering has been that lever. And now, AI is compounding that lever. How will we use AI to apply 100 or 1000x leverage to the greatest lever to move the world?

Matan Grinberg and Eno Reyes, co-founders of Factory, have chosen to do things differently than many of their peers in this white-hot space. They sell a fleet of “Droids,” purpose-built dev agents which accomplish different tasks in the software development lifecycle (like code review, testing, pull requests or writing code). Rather than training their own foundation model, their approach is to build something useful for engineering orgs today on top of the rapidly improving models, aligning with the developer and evolving with them.

Matan and Eno are optimistic about the effects of autonomy in software development and on building a company in the application layer. Their advice to founders, “The only way you can win is by executing faster and being more obsessed.”

Hosted by: Sonya Huang and Pat Grady, Sequoia Capital

Mentioned:

Juan Maldacena, Institute for Advanced Study, string theorist that Matan cold called as an undergrad
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering, small-model open-source software engineering agent
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?, an evaluation framework for GitHub issues
Monte Carlo tree search, a 2006 algorithm for solving decision making in games (and used in AlphaGo)
Language agent tree search, a framework for LLM planning, acting and reasoning
The Bitter Lesson, Rich Sutton’s essay on scaling in search and learning
Code churn, time to merge, cycle time, metrics Factory thinks are important to eng orgs

Transcript: https://www.sequoiacap.com/podcast/training-data-factory/

00:00 Introduction

01:36 Personal backgrounds

10:54 The compound lever

12:41 What is Factory?

16:29 Cognitive architectures

21:13 800 engineers at OpenAI are working on my margins

24:00 Jeff Dean doesn't understand your code base

25:40 Individual dev productivity vs system-wide optimization

30:04 Results: Factory in action

32:54 Learnings along the way

35:36 Fully autonomous Jeff Deans

37:56 Beacons of the upcoming age

40:04 How far are we?

43:02 Competition

45:32 Lightning round

49:34 Bonus round: Factory's SWE-bench results

LangChain’s Harrison Chase on Building the Orchestration Layer for AI Agents

18 juni 2024 | 50 min

Last year, AutoGPT and Baby AGI captured our imaginations—agents quickly became the buzzword of the day…and then things went quiet. AutoGPT and Baby AGI may have marked a peak in the hype cycle, but this year has seen a wave of agentic breakouts on the product side, from Klarna’s customer support AI to Cognition’s Devin, etc.

Harrison Chase of LangChain is focused on enabling the orchestration layer for agents. In this conversation, he explains what’s changed that’s allowing agents to improve performance and find traction.

Harrison shares what he’s optimistic about, where he sees promise for agents vs. what he thinks will be trained into models themselves, and discusses novel kinds of UX that he imagines might transform how we experience agents in the future.

Hosted by: Sonya Huang and Pat Grady, Sequoia Capital

Mentioned:

ReAct: Synergizing Reasoning and Acting in Language Models, the first cognitive architecture for agents
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering, small-model open-source software engineering agent from researchers at Princeton
Devin, autonomous software engineering from Cognition
V0: Generative UI agent from Vercel
GPT Researcher, a research agent
Language Model Cascades: 2022 paper by Google Brain and now OpenAI researcher David Dohan that was influential for Harrison in developing LangChain

Transcript: https://www.sequoiacap.com/podcast/training-data-harrison-chase/

00:00 Introduction

01:21 What are agents?

05:00 What is LangChain’s role in the agent ecosystem?

11:13 What is a cognitive architecture?

13:20 Is bespoke and hard coded the way the world is going, or a stop gap?

18:48 Focus on what makes your beer taste better

20:37 So what?

22:20 Where are agents getting traction?

25:35 Reflection, chain of thought, other techniques?

30:42 UX can influence the effectiveness of the architecture

35:30 What’s out of scope?

38:04 Fine tuning vs prompting?

42:17 Existing observability tools for LLMs vs needing a new architecture/approach

45:38 Lightning round

Introducing "Training Data"

5 juni 2024 | 1 min

Training Data

Join us as we train our neural nets on the theme of the century: AI.

Om podden

Avsnitt

Workday CEO Carl Eschenbach: Building the System of Record for the AI Era

The Quest to ‘Solve All Diseases’ with AI: Isomorphic Labs’ Max Jaderberg

Pricing in the AI Era: From Inputs to Outcomes, with Paid CEO Manny Medina

Arc Institute's Patrick Hsu on Building an App Store for Biology with AI

Replit CEO Amjad Masad on 1 Billion Developers: A Better End State than AGI?

Why CRM Needs an AI Revolution, with Day.ai Founder Christopher O’Donnell

From Software Engineers to AI Word Artisans: Filip Kozera of Wordware

Josh Woodward: Google Labs is Rapidly Building AI Products from 0-to-1

How AI Breakout Harvey is Transforming Legal Services, with CEO Winston Weinberg

The AI Product Going Viral With Doctors: OpenEvidence, with CEO Daniel Nadler

OpenAI’s Deep Research Team on Why Reinforcement Learning is the Future for AI Agents

Palo Alto Networks’ Nikesh Arora: AI, Security and the New World Order

MongoDB’s Sahir Azam: Vector Databases and the Data Structure of AI

Roblox Studio Head Stef Corazza: Using AI to Empower Creators

ReflectionAI Founder Ioannis Antonoglou: From AlphaGo to AGI

Kumo’s Hema Raghavan: Turning Graph AI into ROI

Databricks Founder Ion Stoica: Turning Academic Open Source into Startup Success

XBOW CEO and GitHub Copilot Creator Oege de Moor: Cracking the Code on Offensive Security With AI

Ramp CEO Eric Glyman: Using AI to Build “Self-Driving Money”

Dust’s Gabriel Hubert and Stanislas Polu: Getting the Most From AI With Multiple Custom Agents

Clay’s Kareem Amin on Building the Sales ‘System of Action’ with AI

Decart’s Dean Leitersdorf on AI-Generated Video Games and Worlds

How Glean CEO Arvind Jain Solved the Enterprise Search Problem – and What It Means for AI at Work

OpenAI Researcher Dan Roberts on What Physics Can Teach Us About AI

Google NotebookLM’s Raiza Martin and Jason Spielman on Creating Delightful AI Podcast Hosts and the Potential for Source-Grounded AI

Snowflake CEO Sridhar Ramaswamy on Using Data to Create Simple, Reliable AI for Businesses

OpenAI's Noam Brown, Ilge Akkaya and Hunter Lightman on o1 and Teaching LLMs to Reason Better

Why Vlad Tenev and Tudor Achim of Harmonic Think AI Is About to Change Math—and Why It Matters

Jim Fan on Nvidia’s Embodied AI Lab and Jensen Huang’s Prediction that All Robots will be Autonomous

Founder Eric Steinberger on Magic’s Counterintuitive Approach to Pursuing AGI

Crucible Moments Returns for S2: The ServiceNow Story ft. CEO Frank Slootman & Founder Fred Luddy

Sierra Co-Founder Clay Bavor on Making Customer-Facing AI Agents Delightful

Phaidra’s Jim Gao on Building the Fourth Industrial Revolution with Reinforcement Learning

Fireworks Founder Lin Qiao on How Fast Inference and Small Models Will Benefit Businesses

GitHub CEO Thomas Dohmke on Building Copilot, and the the Future of Software Development

Meta’s Joe Spisak on Llama 3.1 405B and the Democratization of Frontier Models

Klarna CEO Sebastian Siemiatkowski on Getting AI to Do the Work of 700 Customer Service Reps

Reflection AI’s Misha Laskin on the AlphaGo Moment for LLMs

Microsoft CTO Kevin Scott on How Far Scaling Laws Will Extend

Zapier’s Mike Knoop launches ARC Prize to Jumpstart New Ideas for AGI

Factory’s Matan Grinberg and Eno Reyes Unleash the Droids on Software Development

LangChain’s Harrison Chase on Building the Orchestration Layer for AI Agents

Introducing "Training Data"