AI researcher Jim Fan has had a charmed career. He was OpenAI’s first intern before he did his PhD at Stanford with “godmother of AI,” Fei-Fei Li. He graduated into a research scientist position at Nvidia and now leads its Embodied AI “GEAR” group. The lab’s current work spans foundation models for humanoid robots to agents for virtual worlds.
Jim describes a three-pronged data strategy for robotics, combining internet-scale data, simulation data and real world robot data. He believes that in the next few years it will be possible to create a “foundation agent” that can generalize across skills, embodiments and realities—both physical and virtual. He also supports Jensen Huang’s idea that “Everything that moves will eventually be autonomous.”
Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital
Mentioned in this episode:
00:00 Introduction
01:35 Jim’s journey to embodied intelligence
04:53 The GEAR Group
07:32 Three kinds of data for robotics
10:32 A GPT-3 moment for robotics
16:05 Choosing the humanoid robot form factor
19:37 Specialized generalists
21:59 GR00T gets its own chip
23:35 Eureka and Issac Sim
25:23 Why now for robotics?
28:53 Exploring virtual worlds
36:28 Implications for games
39:13 Is the virtual world in service of the physical world?
42:10 Alternative architectures to Transformers
44:15 Lightning round