Sveriges mest populära poddar

Interconnects

(Voiceover) Building on evaluation quicksand

17 min • 16 oktober 2024

Read the full post here: https://www.interconnects.ai/p/building-on-evaluation-quicksand

Chapters

00:00 Building on evaluation quicksand

01:26 The causes of closed evaluation silos

06:35 The challenge facing open evaluation tools

10:47 Frontiers in evaluation

11:32 New types of synthetic data contamination

13:57 Building harder evaluations

Figures

Fig 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/manual/openai-predictions.webp



Get full access to Interconnects at www.interconnects.ai/subscribe
Förekommer på
00:00 -00:00