Nicholas Carlini from Google DeepMind offers his view of AI security, emergent LLM capabilities, and his groundbreaking model-stealing research. He reveals how LLMs can unexpectedly excel at tasks like chess and discusses the security pitfalls of LLM-generated code.
SPONSOR MESSAGES:
***
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. Are you interested in working on reasoning, or getting involved in their events?
Goto https://tufalabs.ai/
***
Transcript: https://www.dropbox.com/scl/fi/lat7sfyd4k3g5k9crjpbf/CARLINI.pdf?rlkey=b7kcqbvau17uw6rksbr8ccd8v&dl=0
TOC:
1. ML Security Fundamentals
[00:00:00] 1.1 ML Model Reasoning and Security Fundamentals
[00:03:04] 1.2 ML Security Vulnerabilities and System Design
[00:08:22] 1.3 LLM Chess Capabilities and Emergent Behavior
[00:13:20] 1.4 Model Training, RLHF, and Calibration Effects
2. Model Evaluation and Research Methods
[00:19:40] 2.1 Model Reasoning and Evaluation Metrics
[00:24:37] 2.2 Security Research Philosophy and Methodology
[00:27:50] 2.3 Security Disclosure Norms and Community Differences
3. LLM Applications and Best Practices
[00:44:29] 3.1 Practical LLM Applications and Productivity Gains
[00:49:51] 3.2 Effective LLM Usage and Prompting Strategies
[00:53:03] 3.3 Security Vulnerabilities in LLM-Generated Code
4. Advanced LLM Research and Architecture
[00:59:13] 4.1 LLM Code Generation Performance and O(1) Labs Experience
[01:03:31] 4.2 Adaptation Patterns and Benchmarking Challenges
[01:10:10] 4.3 Model Stealing Research and Production LLM Architecture Extraction
REFS:
[00:01:15] Nicholas Carlini’s personal website & research profile (Google DeepMind, ML security) - https://nicholas.carlini.com/
[00:01:50] CentML AI compute platform for language model workloads - https://centml.ai/
[00:04:30] Seminal paper on neural network robustness against adversarial examples (Carlini & Wagner, 2016) - https://arxiv.org/abs/1608.04644
[00:05:20] Computer Fraud and Abuse Act (CFAA) – primary U.S. federal law on computer hacking liability - https://www.justice.gov/jm/jm-9-48000-computer-fraud
[00:08:30] Blog post: Emergent chess capabilities in GPT-3.5-turbo-instruct (Nicholas Carlini, Sept 2023) - https://nicholas.carlini.com/writing/2023/chess-llm.html
[00:16:10] Paper: “Self-Play Preference Optimization for Language Model Alignment” (Yue Wu et al., 2024) - https://arxiv.org/abs/2405.00675
[00:18:00] GPT-4 Technical Report: development, capabilities, and calibration analysis - https://arxiv.org/abs/2303.08774
[00:22:40] Historical shift from descriptive to algebraic chess notation (FIDE) - https://en.wikipedia.org/wiki/Descriptive_notation
[00:23:55] Analysis of distribution shift in ML (Hendrycks et al.) - https://arxiv.org/abs/2006.16241
[00:27:40] Nicholas Carlini’s essay “Why I Attack” (June 2024) – motivations for security research - https://nicholas.carlini.com/writing/2024/why-i-attack.html
[00:34:05] Google Project Zero’s 90-day vulnerability disclosure policy - https://googleprojectzero.blogspot.com/p/vulnerability-disclosure-policy.html
[00:51:15] Evolution of Google search syntax & user behavior (Daniel M. Russell) - https://www.amazon.com/Joy-Search-Google-Master-Information/dp/0262042878
[01:04:05] Rust’s ownership & borrowing system for memory safety - https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html
[01:10:05] Paper: “Stealing Part of a Production Language Model” (Carlini et al., March 2024) – extraction attacks on ChatGPT, PaLM-2 - https://arxiv.org/abs/2403.06634
[01:10:55] First model stealing paper (Tramèr et al., 2016) – attacking ML APIs via prediction - https://arxiv.org/abs/1609.02943