Sveriges mest populära poddar

Interconnects

Evaluations: Trust, performance, and price (bonus, announcing RewardBench)

13 min • 21 mars 2024

Evaluation is not only getting harder with modern LLMs, it's getting harder because it means something different.
This is AI generated audio with Python and 11Labs. Music generated by Meta's MusicGen.
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/evaluations-trust-performance-and-price

00:00 Evaluations: Trust, performance, and price (bonus, announcing RewardBench)
03:14 The rising price of evaluation
05:40 Announcing RewardBench: The First reward model evaluation tool
08:37 Updates to RLHF evaluation tools

YouTube code intro: https://youtu.be/CAaHAfCqrBA

Figure 1: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/evals/img_026.png
Figure 2: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/evals/img_030.png
Figure 3: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/evals/img_034.png
Figure 4: https://huggingface.co/datasets/natolambert/interconnects-figures/resolve/main/evals/img_040.png



Get full access to Interconnects at www.interconnects.ai/subscribe
Förekommer på
00:00 -00:00