Sveriges mest populära poddar

Machine Learning Street Talk (MLST)

Nicholas Carlini (Google DeepMind)

81 min • 25 januari 2025

Nicholas Carlini from Google DeepMind offers his view of AI security, emergent LLM capabilities, and his groundbreaking model-stealing research. He reveals how LLMs can unexpectedly excel at tasks like chess and discusses the security pitfalls of LLM-generated code.


SPONSOR MESSAGES:

***

CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.

https://centml.ai/pricing/


Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?


Goto https://tufalabs.ai/

***


Transcript: https://www.dropbox.com/scl/fi/lat7sfyd4k3g5k9crjpbf/CARLINI.pdf?rlkey=b7kcqbvau17uw6rksbr8ccd8v&dl=0


TOC:

1. ML Security Fundamentals

[00:00:00] 1.1 ML Model Reasoning and Security Fundamentals

[00:03:04] 1.2 ML Security Vulnerabilities and System Design

[00:08:22] 1.3 LLM Chess Capabilities and Emergent Behavior

[00:13:20] 1.4 Model Training, RLHF, and Calibration Effects


2. Model Evaluation and Research Methods

[00:19:40] 2.1 Model Reasoning and Evaluation Metrics

[00:24:37] 2.2 Security Research Philosophy and Methodology

[00:27:50] 2.3 Security Disclosure Norms and Community Differences


3. LLM Applications and Best Practices

[00:44:29] 3.1 Practical LLM Applications and Productivity Gains

[00:49:51] 3.2 Effective LLM Usage and Prompting Strategies

[00:53:03] 3.3 Security Vulnerabilities in LLM-Generated Code


4. Advanced LLM Research and Architecture

[00:59:13] 4.1 LLM Code Generation Performance and O(1) Labs Experience

[01:03:31] 4.2 Adaptation Patterns and Benchmarking Challenges

[01:10:10] 4.3 Model Stealing Research and Production LLM Architecture Extraction


REFS:

[00:01:15] Nicholas Carlini’s personal website & research profile (Google DeepMind, ML security) - https://nicholas.carlini.com/


[00:01:50] CentML AI compute platform for language model workloads - https://centml.ai/


[00:04:30] Seminal paper on neural network robustness against adversarial examples (Carlini & Wagner, 2016) - https://arxiv.org/abs/1608.04644


[00:05:20] Computer Fraud and Abuse Act (CFAA) – primary U.S. federal law on computer hacking liability - https://www.justice.gov/jm/jm-9-48000-computer-fraud


[00:08:30] Blog post: Emergent chess capabilities in GPT-3.5-turbo-instruct (Nicholas Carlini, Sept 2023) - https://nicholas.carlini.com/writing/2023/chess-llm.html


[00:16:10] Paper: “Self-Play Preference Optimization for Language Model Alignment” (Yue Wu et al., 2024) - https://arxiv.org/abs/2405.00675


[00:18:00] GPT-4 Technical Report: development, capabilities, and calibration analysis - https://arxiv.org/abs/2303.08774


[00:22:40] Historical shift from descriptive to algebraic chess notation (FIDE) - https://en.wikipedia.org/wiki/Descriptive_notation


[00:23:55] Analysis of distribution shift in ML (Hendrycks et al.) - https://arxiv.org/abs/2006.16241


[00:27:40] Nicholas Carlini’s essay “Why I Attack” (June 2024) – motivations for security research - https://nicholas.carlini.com/writing/2024/why-i-attack.html


[00:34:05] Google Project Zero’s 90-day vulnerability disclosure policy - https://googleprojectzero.blogspot.com/p/vulnerability-disclosure-policy.html


[00:51:15] Evolution of Google search syntax & user behavior (Daniel M. Russell) - https://www.amazon.com/Joy-Search-Google-Master-Information/dp/0262042878


[01:04:05] Rust’s ownership & borrowing system for memory safety - https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html


[01:10:05] Paper: “Stealing Part of a Production Language Model” (Carlini et al., March 2024) – extraction attacks on ChatGPT, PaLM-2 - https://arxiv.org/abs/2403.06634


[01:10:55] First model stealing paper (Tramèr et al., 2016) – attacking ML APIs via prediction - https://arxiv.org/abs/1609.02943

Kategorier
Förekommer på
00:00 -00:00