Sveriges mest populära poddar

Interconnects

We aren't running out of training data, we are running out of open training data

8 min • 29 maj 2024

Data licensing deals, scaling, human inputs, and repeating trends in open vs. closed.
This is AI generated audio with Python and 11Labs.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/the-data-wall

0:00 We aren't running out of training data, we are running out of open training data
2:51 Synthetic data: 1 trillion new tokens per day
4:18 Data licensing deals: High costs per token
6:33 Better tokens: Search and new frontiers



Get full access to Interconnects at www.interconnects.ai/subscribe
Förekommer på
00:00 -00:00