Welcome! We engage in fascinating discussions with pre-eminent figures in the AI field. Our flagship show covers current affairs in AI, cognitive science, neuroscience and philosophy of mind with in-depth analysis. Our approach is unrivalled in terms of scope and rigour – we believe in intellectual diversity in AI, and we touch on all of the main ideas in the field with the hype surgically removed. MLST is run by Tim Scarfe, Ph.D (https://www.linkedin.com/in/ecsquizor/) and features regular appearances from MIT Doctor of Philosophy Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/).
The podcast Machine Learning Street Talk (MLST) is created by Machine Learning Street Talk (MLST). The podcast and the artwork on this page are embedded on this page using the public podcast feed (RSS).
Nora Belrose, Head of Interpretability Research at EleutherAI, discusses critical challenges in AI safety and development. The conversation begins with her technical work on concept erasure in neural networks through LEACE (LEAst-squares Concept Erasure), while highlighting how neural networks' progression from simple to complex learning patterns could have important implications for AI safety.
Many fear that advanced AI will pose an existential threat -- pursuing its own dangerous goals once it's powerful enough. But Belrose challenges this popular doomsday scenario with a fascinating breakdown of why it doesn't add up.
Belrose also provides a detailed critique of current AI alignment approaches, particularly examining "counting arguments" and their limitations when applied to AI safety. She argues that the Principle of Indifference may be insufficient for addressing existential risks from advanced AI systems. The discussion explores how emergent properties in complex AI systems could lead to unpredictable and potentially dangerous behaviors that simple reductionist approaches fail to capture.
The conversation concludes by exploring broader philosophical territory, where Belrose discusses her growing interest in Buddhism's potential relevance to a post-automation future. She connects concepts of moral anti-realism with Buddhist ideas about emptiness and non-attachment, suggesting these frameworks might help humans find meaning in a world where AI handles most practical tasks. Rather than viewing this automated future with alarm, she proposes that Zen Buddhism's emphasis on spontaneity and presence might complement a society freed from traditional labor.
SPONSOR MESSAGES:
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments.
https://centml.ai/pricing/
Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on ARC and AGI, they just acquired MindsAI - the current winners of the ARC challenge. Are you interested in working on ARC, or getting involved in their events? Goto https://tufalabs.ai/
Nora Belrose:
https://norabelrose.com/
https://scholar.google.com/citations?user=p_oBc64AAAAJ&hl=en
https://x.com/norabelrose
SHOWNOTES:
https://www.dropbox.com/scl/fi/38fhsv2zh8gnubtjaoq4a/NORA_FINAL.pdf?rlkey=0e5r8rd261821g1em4dgv0k70&st=t5c9ckfb&dl=0
TOC:
1. Neural Network Foundations
[00:00:00] 1.1 Philosophical Foundations and Neural Network Simplicity Bias
[00:02:20] 1.2 LEACE and Concept Erasure Fundamentals
[00:13:16] 1.3 LISA Technical Implementation and Applications
[00:18:50] 1.4 Practical Implementation Challenges and Data Requirements
[00:22:13] 1.5 Performance Impact and Limitations of Concept Erasure
2. Machine Learning Theory
[00:32:23] 2.1 Neural Network Learning Progression and Simplicity Bias
[00:37:10] 2.2 Optimal Transport Theory and Image Statistics Manipulation
[00:43:05] 2.3 Grokking Phenomena and Training Dynamics
[00:44:50] 2.4 Texture vs Shape Bias in Computer Vision Models
[00:45:15] 2.5 CNN Architecture and Shape Recognition Limitations
3. AI Systems and Value Learning
[00:47:10] 3.1 Meaning, Value, and Consciousness in AI Systems
[00:53:06] 3.2 Global Connectivity vs Local Culture Preservation
[00:58:18] 3.3 AI Capabilities and Future Development Trajectory
4. Consciousness Theory
[01:03:03] 4.1 4E Cognition and Extended Mind Theory
[01:09:40] 4.2 Thompson's Views on Consciousness and Simulation
[01:12:46] 4.3 Phenomenology and Consciousness Theory
[01:15:43] 4.4 Critique of Illusionism and Embodied Experience
[01:23:16] 4.5 AI Alignment and Counting Arguments Debate
(TRUNCATED, TOC embedded in MP3 file with more information)
Prof. Gennady Pekhimenko (CEO of CentML, UofT) joins us in this *sponsored episode* to dive deep into AI system optimization and enterprise implementation. From NVIDIA's technical leadership model to the rise of open-source AI, Pekhimenko shares insights on bridging the gap between academic research and industrial applications. Learn about "dark silicon," GPU utilization challenges in ML workloads, and how modern enterprises can optimize their AI infrastructure. The conversation explores why some companies achieve only 10% GPU efficiency and practical solutions for improving AI system performance. A must-watch for anyone interested in the technical foundations of enterprise AI and hardware optimization.
CentML offers competitive pricing for GenAI model deployment, with flexible options to suit a wide range of models, from small to large-scale deployments. Cheaper, faster, no commitments, pay as you go, scale massively, simple to setup. Check it out!
https://centml.ai/pricing/
SPONSOR MESSAGES:
MLST is also sponsored by Tufa AI Labs - https://tufalabs.ai/
They are hiring cracked ML engineers/researchers to work on ARC and build AGI!
SHOWNOTES (diarised transcript, TOC, references, summary, best quotes etc)
https://www.dropbox.com/scl/fi/w9kbpso7fawtm286kkp6j/Gennady.pdf?rlkey=aqjqmncx3kjnatk2il1gbgknk&st=2a9mccj8&dl=0
TOC:
1. AI Strategy and Leadership
[00:00:00] 1.1 Technical Leadership and Corporate Structure
[00:09:55] 1.2 Open Source vs Proprietary AI Models
[00:16:04] 1.3 Hardware and System Architecture Challenges
[00:23:37] 1.4 Enterprise AI Implementation and Optimization
[00:35:30] 1.5 AI Reasoning Capabilities and Limitations
2. AI System Development
[00:38:45] 2.1 Computational and Cognitive Limitations of AI Systems
[00:42:40] 2.2 Human-LLM Communication Adaptation and Patterns
[00:46:18] 2.3 AI-Assisted Software Development Challenges
[00:47:55] 2.4 Future of Software Engineering Careers in AI Era
[00:49:49] 2.5 Enterprise AI Adoption Challenges and Implementation
3. ML Infrastructure Optimization
[00:54:41] 3.1 MLOps Evolution and Platform Centralization
[00:55:43] 3.2 Hardware Optimization and Performance Constraints
[01:05:24] 3.3 ML Compiler Optimization and Python Performance
[01:15:57] 3.4 Enterprise ML Deployment and Cloud Provider Partnerships
4. Distributed AI Architecture
[01:27:05] 4.1 Multi-Cloud ML Infrastructure and Optimization
[01:29:45] 4.2 AI Agent Systems and Production Readiness
[01:32:00] 4.3 RAG Implementation and Fine-Tuning Considerations
[01:33:45] 4.4 Distributed AI Systems Architecture and Ray Framework
5. AI Industry Standards and Research
[01:37:55] 5.1 Origins and Evolution of MLPerf Benchmarking
[01:43:15] 5.2 MLPerf Methodology and Industry Impact
[01:50:17] 5.3 Academic Research vs Industry Implementation in AI
[01:58:59] 5.4 AI Research History and Safety Concerns
Eliezer Yudkowsky and Stephen Wolfram discuss artificial intelligence and its potential existen‑
tial risks. They traversed fundamental questions about AI safety, consciousness, computational irreducibility, and the nature of intelligence.
The discourse centered on Yudkowsky’s argument that advanced AI systems pose an existential threat to humanity, primarily due to the challenge of alignment and the potential for emergent goals that diverge from human values. Wolfram, while acknowledging potential risks, approached the topic from a his signature measured perspective, emphasizing the importance of understanding computational systems’ fundamental nature and questioning whether AI systems would necessarily develop the kind of goal‑directed behavior Yudkowsky fears.
***
MLST IS SPONSORED BY TUFA AI LABS!
The current winners of the ARC challenge, MindsAI are part of Tufa AI Labs. They are hiring ML engineers. Are you interested?! Please goto https://tufalabs.ai/
***
TOC:
1. Foundational AI Concepts and Risks
[00:00:01] 1.1 AI Optimization and System Capabilities Debate
[00:06:46] 1.2 Computational Irreducibility and Intelligence Limitations
[00:20:09] 1.3 Existential Risk and Species Succession
[00:23:28] 1.4 Consciousness and Value Preservation in AI Systems
2. Ethics and Philosophy in AI
[00:33:24] 2.1 Moral Value of Human Consciousness vs. Computation
[00:36:30] 2.2 Ethics and Moral Philosophy Debate
[00:39:58] 2.3 Existential Risks and Digital Immortality
[00:43:30] 2.4 Consciousness and Personal Identity in Brain Emulation
3. Truth and Logic in AI Systems
[00:54:39] 3.1 AI Persuasion Ethics and Truth
[01:01:48] 3.2 Mathematical Truth and Logic in AI Systems
[01:11:29] 3.3 Universal Truth vs Personal Interpretation in Ethics and Mathematics
[01:14:43] 3.4 Quantum Mechanics and Fundamental Reality Debate
4. AI Capabilities and Constraints
[01:21:21] 4.1 AI Perception and Physical Laws
[01:28:33] 4.2 AI Capabilities and Computational Constraints
[01:34:59] 4.3 AI Motivation and Anthropomorphization Debate
[01:38:09] 4.4 Prediction vs Agency in AI Systems
5. AI System Architecture and Behavior
[01:44:47] 5.1 Computational Irreducibility and Probabilistic Prediction
[01:48:10] 5.2 Teleological vs Mechanistic Explanations of AI Behavior
[02:09:41] 5.3 Machine Learning as Assembly of Computational Components
[02:29:52] 5.4 AI Safety and Predictability in Complex Systems
6. Goal Optimization and Alignment
[02:50:30] 6.1 Goal Specification and Optimization Challenges in AI Systems
[02:58:31] 6.2 Intelligence, Computation, and Goal-Directed Behavior
[03:02:18] 6.3 Optimization Goals and Human Existential Risk
[03:08:49] 6.4 Emergent Goals and AI Alignment Challenges
7. AI Evolution and Risk Assessment
[03:19:44] 7.1 Inner Optimization and Mesa-Optimization Theory
[03:34:00] 7.2 Dynamic AI Goals and Extinction Risk Debate
[03:56:05] 7.3 AI Risk and Biological System Analogies
[04:09:37] 7.4 Expert Risk Assessments and Optimism vs Reality
8. Future Implications and Economics
[04:13:01] 8.1 Economic and Proliferation Considerations
SHOWNOTES (transcription, references, summary, best quotes etc):
https://www.dropbox.com/scl/fi/3st8dts2ba7yob161dchd/EliezerWolfram.pdf?rlkey=b6va5j8upgqwl9s2muc924vtt&st=vemwqx7a&dl=0
Francois Chollet, a prominent AI expert and creator of ARC-AGI, discusses intelligence, consciousness, and artificial intelligence.
Chollet explains that real intelligence isn't about memorizing information or having lots of knowledge - it's about being able to handle new situations effectively. This is why he believes current large language models (LLMs) have "near-zero intelligence" despite their impressive abilities. They're more like sophisticated memory and pattern-matching systems than truly intelligent beings.
***
MLST IS SPONSORED BY TUFA AI LABS!
The current winners of the ARC challenge, MindsAI are part of Tufa AI Labs. They are hiring ML engineers. Are you interested?! Please goto https://tufalabs.ai/
***
He introduced his "Kaleidoscope Hypothesis," which suggests that while the world seems infinitely complex, it's actually made up of simpler patterns that repeat and combine in different ways. True intelligence, he argues, involves identifying these basic patterns and using them to understand new situations.
Chollet also talked about consciousness, suggesting it develops gradually in children rather than appearing all at once. He believes consciousness exists in degrees - animals have it to some extent, and even human consciousness varies with age and circumstances (like being more conscious when learning something new versus doing routine tasks).
On AI safety, Chollet takes a notably different stance from many in Silicon Valley. He views AGI development as a scientific challenge rather than a religious quest, and doesn't share the apocalyptic concerns of some AI researchers. He argues that intelligence itself isn't dangerous - it's just a tool for turning information into useful models. What matters is how we choose to use it.
ARC-AGI Prize:
https://arcprize.org/
Francois Chollet:
https://x.com/fchollet
Shownotes:
https://www.dropbox.com/scl/fi/j2068j3hlj8br96pfa7bi/CHOLLET_FINAL.pdf?rlkey=xkbr7tbnrjdl66m246w26uc8k&st=0a4ec4na&dl=0
TOC:
1. Intelligence and Model Building
[00:00:00] 1.1 Intelligence Definition and ARC Benchmark
[00:05:40] 1.2 LLMs as Program Memorization Systems
[00:09:36] 1.3 Kaleidoscope Hypothesis and Abstract Building Blocks
[00:13:39] 1.4 Deep Learning Limitations and System 2 Reasoning
[00:29:38] 1.5 Intelligence vs. Skill in LLMs and Model Building
2. ARC Benchmark and Program Synthesis
[00:37:36] 2.1 Intelligence Definition and LLM Limitations
[00:41:33] 2.2 Meta-Learning System Architecture
[00:56:21] 2.3 Program Search and Occam's Razor
[00:59:42] 2.4 Developer-Aware Generalization
[01:06:49] 2.5 Task Generation and Benchmark Design
3. Cognitive Systems and Program Generation
[01:14:38] 3.1 System 1/2 Thinking Fundamentals
[01:22:17] 3.2 Program Synthesis and Combinatorial Challenges
[01:31:18] 3.3 Test-Time Fine-Tuning Strategies
[01:36:10] 3.4 Evaluation and Leakage Problems
[01:43:22] 3.5 ARC Implementation Approaches
4. Intelligence and Language Systems
[01:50:06] 4.1 Intelligence as Tool vs Agent
[01:53:53] 4.2 Cultural Knowledge Integration
[01:58:42] 4.3 Language and Abstraction Generation
[02:02:41] 4.4 Embodiment in Cognitive Systems
[02:09:02] 4.5 Language as Cognitive Operating System
5. Consciousness and AI Safety
[02:14:05] 5.1 Consciousness and Intelligence Relationship
[02:20:25] 5.2 Development of Machine Consciousness
[02:28:40] 5.3 Consciousness Prerequisites and Indicators
[02:36:36] 5.4 AGI Safety Considerations
[02:40:29] 5.5 AI Regulation Framework
Anil Ananthaswamy is an award-winning science writer and former staff writer and deputy news editor for the London-based New Scientist magazine.
Machine learning systems are making life-altering decisions for us: approving mortgage loans, determining whether a tumor is cancerous, or deciding if someone gets bail. They now influence developments and discoveries in chemistry, biology, and physics—the study of genomes, extrasolar planets, even the intricacies of quantum systems. And all this before large language models such as ChatGPT came on the scene.
We are living through a revolution in machine learning-powered AI that shows no signs of slowing down. This technology is based on relatively simple mathematical ideas, some of which go back centuries, including linear algebra and calculus, the stuff of seventeenth- and eighteenth-century mathematics. It took the birth and advancement of computer science and the kindling of 1990s computer chips designed for video games to ignite the explosion of AI that we see today. In this enlightening book, Anil Ananthaswamy explains the fundamental math behind machine learning, while suggesting intriguing links between artificial and natural intelligence. Might the same math underpin them both?
As Ananthaswamy resonantly concludes, to make safe and effective use of artificial intelligence, we need to understand its profound capabilities and limitations, the clues to which lie in the math that makes machine learning possible.
Why Machines Learn: The Elegant Math Behind Modern AI:
https://amzn.to/3UAWX3D
https://anilananthaswamy.com/
Sponsor message:
DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)?
Interested? Apply for an ML research position: [email protected]
Shownotes:
Chapters:
1. ML Fundamentals and Prerequisites
[00:00:00] 1.1 Differences Between Human and Machine Learning
[00:00:35] 1.2 Mathematical Prerequisites and Societal Impact of ML
[00:02:20] 1.3 Author's Journey and Book Background
[00:11:30] 1.4 Mathematical Foundations and Core ML Concepts
[00:21:45] 1.5 Bias-Variance Tradeoff and Modern Deep Learning
2. Deep Learning Architecture
[00:29:05] 2.1 Double Descent and Overparameterization in Deep Learning
[00:32:40] 2.2 Mathematical Foundations and Self-Supervised Learning
[00:40:05] 2.3 High-Dimensional Spaces and Model Architecture
[00:52:55] 2.4 Historical Development of Backpropagation
3. AI Understanding and Limitations
[00:59:13] 3.1 Pattern Matching vs Human Reasoning in ML Models
[01:00:20] 3.2 Mathematical Foundations and Pattern Recognition in AI
[01:04:08] 3.3 LLM Reliability and Machine Understanding Debate
[01:12:50] 3.4 Historical Development of Deep Learning Technologies
[01:15:21] 3.5 Alternative AI Approaches and Bio-inspired Methods
4. Ethical and Neurological Perspectives
[01:24:32] 4.1 Neural Network Scaling and Mathematical Limitations
[01:31:12] 4.2 AI Ethics and Societal Impact
[01:38:30] 4.3 Consciousness and Neurological Conditions
[01:46:17] 4.4 Body Ownership and Agency in Neuroscience
Professor Michael Levin explores the revolutionary concept of diverse intelligence, demonstrating how cognitive capabilities extend far beyond traditional brain-based intelligence. Drawing from his groundbreaking research, he explains how even simple biological systems like gene regulatory networks exhibit learning, memory, and problem-solving abilities. Levin introduces key concepts like "cognitive light cones" - the scope of goals a system can pursue - and shows how these ideas are transforming our approach to cancer treatment and biological engineering. His insights challenge conventional views of intelligence and agency, with profound implications for both medicine and artificial intelligence development. This deep discussion reveals how understanding intelligence as a spectrum, from molecular networks to human minds, could be crucial for humanity's future technological development. Contains technical discussion of biological systems, cybernetics, and theoretical frameworks for understanding emergent cognition.
Prof. Michael Levin
https://as.tufts.edu/biology/people/faculty/michael-levin
https://x.com/drmichaellevin
Sponsor message:
DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)?
Interested? Apply for an ML research position: [email protected]
TOC
1. Intelligence Fundamentals and Evolution
[00:00:00] 1.1 Future Evolution of Human Intelligence and Consciousness
[00:03:00] 1.2 Science Fiction's Role in Exploring Intelligence Possibilities
[00:08:15] 1.3 Essential Characteristics of Human-Level Intelligence and Relationships
[00:14:20] 1.4 Biological Systems Architecture and Intelligence
2. Biological Computing and Cognition
[00:24:00] 2.1 Agency and Intelligence in Biological Systems
[00:30:30] 2.2 Learning Capabilities in Gene Regulatory Networks
[00:35:37] 2.3 Biological Control Systems and Competency Architecture
[00:39:58] 2.4 Scientific Metaphors and Polycomputing Paradigm
3. Systems and Collective Intelligence
[00:43:26] 3.1 Embodiment and Problem-Solving Spaces
[00:44:50] 3.2 Perception-Action Loops and Biological Intelligence
[00:46:55] 3.3 Intelligence, Wisdom and Collective Systems
[00:53:07] 3.4 Cancer and Cognitive Light Cones
[00:57:09] 3.5 Emergent Intelligence and AI Agency
Shownotes:
https://www.dropbox.com/scl/fi/i2vl1vs009thg54lxx5wc/LEVIN.pdf?rlkey=dtk8okhbsejryiu2vrht19qp6&st=uzi0vo45&dl=0
REFS:
[0:05:30] A Fire Upon the Deep - Vernor Vinge sci-fi novel on AI and consciousness
[0:05:35] Maria Chudnovsky - MacArthur Fellow, Princeton mathematician, graph theory expert
[0:14:20] Bow-tie architecture in biological systems - Network structure research by Csete & Doyle
[0:15:40] Richard Watson - Southampton Professor, evolution and learning systems expert
[0:17:00] Levin paper on human issues in AI and evolution
[0:19:00] Bow-tie architecture in Darwin's agential materialism - Levin
[0:22:55] Philip Goff - Work on panpsychism and consciousness in Galileo's Error
[0:23:30] Strange Loop - Hofstadter's work on self-reference and consciousness
[0:25:00] The Hard Problem of Consciousness - Van Gulick
[0:26:15] Daniel Dennett - Theories on consciousness and intentional systems
[0:29:35] Principle of Least Action - Light path selection in physics
[0:29:50] Free Energy Principle - Friston's unified behavioral framework
[0:30:35] Gene regulatory networks - Learning capabilities in biological systems
[0:36:55] Minimal networks with learning capacity - Levin
[0:38:50] Multi-scale competency in biological systems - Levin
[0:41:40] Polycomputing paradigm - Biological computation by Bongard & Levin
[0:45:40] Collective intelligence in biology - Levin et al.
[0:46:55] Niche construction and stigmergy - Torday
[0:53:50] Tasmanian Devil Facial Tumor Disease - Transmissible cancer research
[0:55:05] Cognitive light cone - Computational boundaries of self - Levin
[0:58:05] Cognitive properties in sorting algorithms - Zhang, Goldstein & Levin
Will Williams is CTO of Speechmatics in Cambridge. In this sponsored episode - he shares deep technical insights into modern speech recognition technology and system architecture. The episode covers several key technical areas:
* Speechmatics' hybrid approach to ASR, which focusses on unsupervised learning methods, achieving comparable results with 100x less data than fully supervised approaches. Williams explains why this is more efficient and generalizable than end-to-end models like Whisper.
* Their production architecture implementing multiple operating points for different latency-accuracy trade-offs, with careful latency padding (up to 1.8 seconds) to ensure consistent user experience. The system uses lattice-based decoding with language model integration for improved accuracy.
* The challenges and solutions in real-time ASR, including their approach to diarization (speaker identification), handling cross-talk, and implicit source separation. Williams explains why these problems remain difficult even with modern deep learning approaches.
* Their testing and deployment infrastructure, including the use of mirrored environments for catching edge cases in production, and their strategy of maintaining global models rather than allowing customer-specific fine-tuning.
* Technical evolution in ASR, from early days of custom CUDA kernels and manual memory management to modern frameworks, with Williams offering interesting critiques of current PyTorch memory management approaches and arguing for more efficient direct memory allocation in production systems.
Get coding with their API! This is their URL:
https://www.speechmatics.com/
DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)?
MLST is sponsored by Tufa Labs:
Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more.
Interested? Apply for an ML research position: [email protected]
TOC
1. ASR Core Technology & Real-time Architecture
[00:00:00] 1.1 ASR and Diarization Fundamentals
[00:05:25] 1.2 Real-time Conversational AI Architecture
[00:09:21] 1.3 Neural Network Streaming Implementation
[00:12:49] 1.4 Multi-modal System Integration
2. Production System Optimization
[00:29:38] 2.1 Production Deployment and Testing Infrastructure
[00:35:40] 2.2 Model Architecture and Deployment Strategy
[00:37:12] 2.3 Latency-Accuracy Trade-offs
[00:39:15] 2.4 Language Model Integration
[00:40:32] 2.5 Lattice-based Decoding Architecture
3. Performance Evaluation & Ethical Considerations
[00:44:00] 3.1 ASR Performance Metrics and Capabilities
[00:46:35] 3.2 AI Regulation and Evaluation Methods
[00:51:09] 3.3 Benchmark and Testing Challenges
[00:54:30] 3.4 Real-world Implementation Metrics
[01:00:51] 3.5 Ethics and Privacy Considerations
4. ASR Technical Evolution
[01:09:00] 4.1 WER Calculation and Evaluation Methodologies
[01:10:21] 4.2 Supervised vs Self-Supervised Learning Approaches
[01:21:02] 4.3 Temporal Learning and Feature Processing
[01:24:45] 4.4 Feature Engineering to Automated ML
5. Enterprise Implementation & Scale
[01:27:55] 5.1 Future AI Systems and Adaptation
[01:31:52] 5.2 Technical Foundations and History
[01:34:53] 5.3 Infrastructure and Team Scaling
[01:38:05] 5.4 Research and Talent Strategy
[01:41:11] 5.5 Engineering Practice Evolution
Shownotes:
https://www.dropbox.com/scl/fi/d94b1jcgph9o8au8shdym/Speechmatics.pdf?rlkey=bi55wvktzomzx0y5sic6jz99y&st=6qwofv8t&dl=0
Dr. Sanjeev Namjoshi, a machine learning engineer who recently submitted a book on Active Inference to MIT Press, discusses the theoretical foundations and practical applications of Active Inference, the Free Energy Principle (FEP), and Bayesian mechanics. He explains how these frameworks describe how biological and artificial systems maintain stability by minimizing uncertainty about their environment.
DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)?
MLST is sponsored by Tufa Labs:
Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more.
Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2.
Interested? Apply for an ML research position: [email protected]
Namjoshi traces the evolution of these fields from early 2000s neuroscience research to current developments, highlighting how Active Inference provides a unified framework for perception and action through variational free energy minimization. He contrasts this with traditional machine learning approaches, emphasizing Active Inference's natural capacity for exploration and curiosity through epistemic value.
He sees Active Inference as being at a similar stage to deep learning in the early 2000s - poised for significant breakthroughs but requiring better tools and wider adoption. While acknowledging current computational challenges, he emphasizes Active Inference's potential advantages over reinforcement learning, particularly its principled approach to exploration and planning.
Dr. Sanjeev Namjoshi
https://snamjoshi.github.io/
TOC:
1. Theoretical Foundations: AI Agency and Sentience
[00:00:00] 1.1 Intro
[00:02:45] 1.2 Free Energy Principle and Active Inference Theory
[00:11:16] 1.3 Emergence and Self-Organization in Complex Systems
[00:19:11] 1.4 Agency and Representation in AI Systems
[00:29:59] 1.5 Bayesian Mechanics and Systems Modeling
2. Technical Framework: Active Inference and Free Energy
[00:38:37] 2.1 Generative Processes and Agent-Environment Modeling
[00:42:27] 2.2 Markov Blankets and System Boundaries
[00:44:30] 2.3 Bayesian Inference and Prior Distributions
[00:52:41] 2.4 Variational Free Energy Minimization Framework
[00:55:07] 2.5 VFE Optimization Techniques: Generalized Filtering vs DEM
3. Implementation and Optimization Methods
[00:58:25] 3.1 Information Theory and Free Energy Concepts
[01:05:25] 3.2 Surprise Minimization and Action in Active Inference
[01:15:58] 3.3 Evolution of Active Inference Models: Continuous to Discrete Approaches
[01:26:00] 3.4 Uncertainty Reduction and Control Systems in Active Inference
4. Safety and Regulatory Frameworks
[01:32:40] 4.1 Historical Evolution of Risk Management and Predictive Systems
[01:36:12] 4.2 Agency and Reality: Philosophical Perspectives on Models
[01:39:20] 4.3 Limitations of Symbolic AI and Current System Design
[01:46:40] 4.4 AI Safety Regulation and Corporate Governance
5. Socioeconomic Integration and Modeling
[01:52:55] 5.1 Economic Policy and Public Sentiment Modeling
[01:55:21] 5.2 Free Energy Principle: Libertarian vs Collectivist Perspectives
[01:58:53] 5.3 Regulation of Complex Socio-Technical Systems
[02:03:04] 5.4 Evolution and Current State of Active Inference Research
6. Future Directions and Applications
[02:14:26] 6.1 Active Inference Applications and Future Development
[02:22:58] 6.2 Cultural Learning and Active Inference
[02:29:19] 6.3 Hierarchical Relationship Between FEP, Active Inference, and Bayesian Mechanics
[02:33:22] 6.4 Historical Evolution of Free Energy Principle
[02:38:52] 6.5 Active Inference vs Traditional Machine Learning Approaches
Transcript and shownotes with refs and URLs:
https://www.dropbox.com/scl/fi/qj22a660cob1795ej0gbw/SanjeevShow.pdf?rlkey=w323r3e8zfsnve22caayzb17k&st=el1fdgfr&dl=0
Dr. Joscha Bach discusses advanced AI, consciousness, and cognitive modeling. He presents consciousness as a virtual property emerging from self-organizing software patterns, challenging panpsychism and materialism. Bach introduces "Cyberanima," reinterpreting animism through information processing, viewing spirits as self-organizing software agents.
He addresses limitations of current large language models and advocates for smaller, more efficient AI models capable of reasoning from first principles. Bach describes his work with Liquid AI on novel neural network architectures for improved expressiveness and efficiency.
The interview covers AI's societal implications, including regulation challenges and impact on innovation. Bach argues for balancing oversight with technological progress, warning against overly restrictive regulations.
Throughout, Bach frames consciousness, intelligence, and agency as emergent properties of complex information processing systems, proposing a computational framework for cognitive phenomena and reality.
SPONSOR MESSAGE:
DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)? MLST is sponsored by Tufa Labs: Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more. Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2. Interested? Apply for an ML research position: benjamin@tufa.ai
TOC
[00:00:00] 1.1 Consciousness and Intelligence in AI Development
[00:07:44] 1.2 Agency, Intelligence, and Their Relationship to Physical Reality
[00:13:36] 1.3 Virtual Patterns and Causal Structures in Consciousness
[00:25:49] 1.4 Reinterpreting Concepts of God and Animism in Information Processing Terms
[00:32:50] 1.5 Animism and Evolution as Competition Between Software Agents
2. Self-Organizing Systems and Cognitive Models in AI
[00:37:59] 2.1 Consciousness as self-organizing software
[00:45:49] 2.2 Critique of panpsychism and alternative views on consciousness
[00:50:48] 2.3 Emergence of consciousness in complex systems
[00:52:50] 2.4 Neuronal motivation and the origins of consciousness
[00:56:47] 2.5 Coherence and Self-Organization in AI Systems
3. Advanced AI Architectures and Cognitive Processes
[00:57:50] 3.1 Second-Order Software and Complex Mental Processes
[01:01:05] 3.2 Collective Agency and Shared Values in AI
[01:05:40] 3.3 Limitations of Current AI Agents and LLMs
[01:06:40] 3.4 Liquid AI and Novel Neural Network Architectures
[01:10:06] 3.5 AI Model Efficiency and Future Directions
[01:19:00] 3.6 LLM Limitations and Internal State Representation
4. AI Regulation and Societal Impact
[01:31:23] 4.1 AI Regulation and Societal Impact
[01:49:50] 4.2 Open-Source AI and Industry Challenges
Refs in shownotes and MP3 metadata
Shownotes:
https://www.dropbox.com/scl/fi/g28dosz19bzcfs5imrvbu/JoschaInterview.pdf?rlkey=s3y18jy192ktz6ogd7qtvry3d&st=10z7q7w9&dl=0
Alessandro Palmarini is a post-baccalaureate researcher at the Santa Fe Institute working under the supervision of Melanie Mitchell. He completed his undergraduate degree in Artificial Intelligence and Computer Science at the University of Edinburgh. Palmarini's current research focuses on developing AI systems that can efficiently acquire new skills from limited data, inspired by François Chollet's work on measuring intelligence. His work builds upon the DreamCoder program synthesis system, introducing a novel approach called "dream decompiling" to improve library learning in inductive program synthesis. Palmarini is particularly interested in addressing the Abstraction and Reasoning Corpus (ARC) challenge, aiming to create AI systems that can perform abstract reasoning tasks more efficiently than current approaches. His research explores the balance between computational efficiency and data efficiency in AI learning processes.
DO YOU WANT WORK ON ARC with the MindsAI team (current ARC winners)? MLST is sponsored by Tufa Labs: Focus: ARC, LLMs, test-time-compute, active inference, system2 reasoning, and more. Future plans: Expanding to complex environments like Warcraft 2 and Starcraft 2. Interested? Apply for an ML research position: benjamin@tufa.ai
TOC:
1. Intelligence Measurement in AI Systems
[00:00:00] 1.1 Defining Intelligence in AI Systems
[00:02:00] 1.2 Research at Santa Fe Institute
[00:04:35] 1.3 Impact of Gaming on AI Development
[00:05:10] 1.4 Comparing AI and Human Learning Efficiency
2. Efficient Skill Acquisition in AI
[00:06:40] 2.1 Intelligence as Skill Acquisition Efficiency
[00:08:25] 2.2 Limitations of Current AI Systems in Generalization
[00:09:45] 2.3 Human vs. AI Cognitive Processes
[00:10:40] 2.4 Measuring AI Intelligence: Chollet's ARC Challenge
3. Program Synthesis and ARC Challenge
[00:12:55] 3.1 Philosophical Foundations of Program Synthesis
[00:17:14] 3.2 Introduction to Program Induction and ARC Tasks
[00:18:49] 3.3 DreamCoder: Principles and Techniques
[00:27:55] 3.4 Trade-offs in Program Synthesis Search Strategies
[00:31:52] 3.5 Neural Networks and Bayesian Program Learning
4. Advanced Program Synthesis Techniques
[00:32:30] 4.1 DreamCoder and Dream Decompiling Approach
[00:39:00] 4.2 Beta Distribution and Caching in Program Synthesis
[00:45:10] 4.3 Performance and Limitations of Dream Decompiling
[00:47:45] 4.4 Alessandro's Approach to ARC Challenge
[00:51:12] 4.5 Conclusion and Future Discussions
Refs:
Full reflist on YT VD, Show Notes and MP3 metadata
Show Notes: https://www.dropbox.com/scl/fi/x50201tgqucj5ba2q4typ/Ale.pdf?rlkey=0ubvk7p5gtyx1gpownpdadim8&st=5pniu3nq&dl=0
François Chollet discusses the limitations of Large Language Models (LLMs) and proposes a new approach to advancing artificial intelligence. He argues that current AI systems excel at pattern recognition but struggle with logical reasoning and true generalization.
This was Chollet's keynote talk at AGI-24, filmed in high-quality. We will be releasing a full interview with him shortly. A teaser clip from that is played in the intro!
Chollet introduces the Abstraction and Reasoning Corpus (ARC) as a benchmark for measuring AI progress towards human-like intelligence. He explains the concept of abstraction in AI systems and proposes combining deep learning with program synthesis to overcome current limitations. Chollet suggests that breakthroughs in AI might come from outside major tech labs and encourages researchers to explore new ideas in the pursuit of artificial general intelligence.
TOC
1. LLM Limitations and Intelligence Concepts
[00:00:00] 1.1 LLM Limitations and Composition
[00:12:05] 1.2 Intelligence as Process vs. Skill
[00:17:15] 1.3 Generalization as Key to AI Progress
2. ARC-AGI Benchmark and LLM Performance
[00:19:59] 2.1 Introduction to ARC-AGI Benchmark
[00:20:05] 2.2 Introduction to ARC-AGI and the ARC Prize
[00:23:35] 2.3 Performance of LLMs and Humans on ARC-AGI
3. Abstraction in AI Systems
[00:26:10] 3.1 The Kaleidoscope Hypothesis and Abstraction Spectrum
[00:30:05] 3.2 LLM Capabilities and Limitations in Abstraction
[00:32:10] 3.3 Value-Centric vs Program-Centric Abstraction
[00:33:25] 3.4 Types of Abstraction in AI Systems
4. Advancing AI: Combining Deep Learning and Program Synthesis
[00:34:05] 4.1 Limitations of Transformers and Need for Program Synthesis
[00:36:45] 4.2 Combining Deep Learning and Program Synthesis
[00:39:59] 4.3 Applying Combined Approaches to ARC Tasks
[00:44:20] 4.4 State-of-the-Art Solutions for ARC
Shownotes (new!): https://www.dropbox.com/scl/fi/i7nsyoahuei6np95lbjxw/CholletKeynote.pdf?rlkey=t3502kbov5exsdxhderq70b9i&st=1ca91ewz&dl=0
[0:01:15] Abstraction and Reasoning Corpus (ARC): AI benchmark (François Chollet)
https://arxiv.org/abs/1911.01547
[0:05:30] Monty Hall problem: Probability puzzle (Steve Selvin)
https://www.tandfonline.com/doi/abs/10.1080/00031305.1975.10479121
[0:06:20] LLM training dynamics analysis (Tirumala et al.)
https://arxiv.org/abs/2205.10770
[0:10:20] Transformer limitations on compositionality (Dziri et al.)
https://arxiv.org/abs/2305.18654
[0:10:25] Reversal Curse in LLMs (Berglund et al.)
https://arxiv.org/abs/2309.12288
[0:19:25] Measure of intelligence using algorithmic information theory (François Chollet)
https://arxiv.org/abs/1911.01547
[0:20:10] ARC-AGI: GitHub repository (François Chollet)
https://github.com/fchollet/ARC-AGI
[0:22:15] ARC Prize: $1,000,000+ competition (François Chollet)
https://arcprize.org/
[0:33:30] System 1 and System 2 thinking (Daniel Kahneman)
https://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555
[0:34:00] Core knowledge in infants (Elizabeth Spelke)
https://www.harvardlds.org/wp-content/uploads/2017/01/SpelkeKinzler07-1.pdf
[0:34:30] Embedding interpretive spaces in ML (Tennenholtz et al.)
https://arxiv.org/abs/2310.04475
[0:44:20] Hypothesis Search with LLMs for ARC (Wang et al.)
https://arxiv.org/abs/2309.05660
[0:44:50] Ryan Greenblatt's high score on ARC public leaderboard
https://arcprize.org/
Ivan Zhang, co-founder of Cohere, discusses the company's enterprise-focused AI solutions. He explains Cohere's early emphasis on embedding technology and training models for secure environments.
Zhang highlights their implementation of Retrieval-Augmented Generation in healthcare, significantly reducing doctor preparation time. He explores the shift from monolithic AI models to heterogeneous systems and the importance of improving various AI system components. Zhang shares insights on using synthetic data to teach models reasoning, the democratization of software development through AI, and how his gaming skills transfer to running an AI company.
He advises young developers to fully embrace AI technologies and offers perspectives on AI reliability, potential risks, and future model architectures.
https://cohere.com/
https://ivanzhang.ca/
https://x.com/1vnzh
TOC:
00:00:00 Intro
00:03:20 AI & Language Model Evolution
00:06:09 Future AI Apps & Development
00:09:29 Impact on Software Dev Practices
00:13:03 Philosophical & Societal Implications
00:16:30 Compute Efficiency & RAG
00:20:39 Adoption Challenges & Solutions
00:22:30 GPU Optimization & Kubernetes Limits
00:24:16 Cohere's Implementation Approach
00:28:13 Gaming's Professional Influence
00:34:45 Transformer Optimizations
00:36:45 Future Models & System-Level Focus
00:39:20 Inference-Time Computation & Reasoning
00:42:05 Capturing Human Thought in AI
00:43:15 Research, Hiring & Developer Advice
REFS:
00:02:31 Cohere, https://cohere.com/
00:02:40 The Transformer architecture, https://arxiv.org/abs/1706.03762
00:03:22 The Innovator's Dilemma, https://www.amazon.com/Innovators-Dilemma-Technologies-Management-Innovation/dp/1633691780
00:09:15 The actor model, https://en.wikipedia.org/wiki/Actor_model
00:14:35 John Searle's Chinese Room Argument, https://plato.stanford.edu/entries/chinese-room/
00:18:00 Retrieval-Augmented Generation, https://arxiv.org/abs/2005.11401
00:18:40 Retrieval-Augmented Generation, https://docs.cohere.com/v2/docs/retrieval-augmented-generation-rag
00:35:39 Let’s Verify Step by Step, https://arxiv.org/pdf/2305.20050
00:39:20 Adaptive Inference-Time Compute, https://arxiv.org/abs/2410.02725
00:43:20 Ryan Greenblatt ARC entry, https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt
Disclaimer: This show is part of our Cohere partnership series
Prof. Tim Rocktäschel, AI researcher at UCL and Google DeepMind, talks about open-ended AI systems. These systems aim to keep learning and improving on their own, like evolution does in nature.
Ad: Are you a hardcore ML engineer who wants to work for Daniel Cahn at SlingshotAI building AI for mental health? Give him an email! - danielc@slingshot.xyz
TOC:
00:00:00 Introduction to Open-Ended AI and Key Concepts
00:01:37 Tim Rocktäschel's Background and Research Focus
00:06:25 Defining Open-Endedness in AI Systems
00:10:39 Subjective Nature of Interestingness and Learnability
00:16:22 Open-Endedness in Practice: Examples and Limitations
00:17:50 Assessing Novelty in Open-ended AI Systems
00:20:05 Adversarial Attacks and AI Robustness
00:24:05 Rainbow Teaming and LLM Safety
00:25:48 Open-ended Research Approaches in AI
00:29:05 Balancing Long-term Vision and Exploration in AI Research
00:37:25 LLMs in Program Synthesis and Open-Ended Learning
00:37:55 Transition from Human-Based to Novel AI Strategies
00:39:00 Expanding Context Windows and Prompt Evolution
00:40:17 AI Intelligibility and Human-AI Interfaces
00:46:04 Self-Improvement and Evolution in AI Systems
Show notes (New!) https://www.dropbox.com/scl/fi/5avpsyz8jbn4j1az7kevs/TimR.pdf?rlkey=pqjlcqbtm3undp4udtgfmie8n&st=x50u1d1m&dl=0
REFS:
00:01:47 - UCL DARK Lab (Rocktäschel) - AI research lab focusing on RL and open-ended learning - https://ucldark.com/
00:02:31 - GENIE (Bruce) - Generative interactive environment from unlabelled videos - https://arxiv.org/abs/2402.15391
00:02:42 - Promptbreeder (Fernando) - Self-referential LLM prompt evolution - https://arxiv.org/abs/2309.16797
00:03:05 - Picbreeder (Secretan) - Collaborative online image evolution - https://dl.acm.org/doi/10.1145/1357054.1357328
00:03:14 - Why Greatness Cannot Be Planned (Stanley) - Book on open-ended exploration - https://www.amazon.com/Why-Greatness-Cannot-Planned-Objective/dp/3319155237
00:04:36 - NetHack Learning Environment (Küttler) - RL research in procedurally generated game - https://arxiv.org/abs/2006.13760
00:07:35 - Open-ended learning (Clune) - AI systems for continual learning and adaptation - https://arxiv.org/abs/1905.10985
00:07:35 - OMNI (Zhang) - LLMs modeling human interestingness for exploration - https://arxiv.org/abs/2306.01711
00:10:42 - Observer theory (Wolfram) - Computationally bounded observers in complex systems - https://writings.stephenwolfram.com/2023/12/observer-theory/
00:15:25 - Human-Timescale Adaptation (Rocktäschel) - RL agent adapting to novel 3D tasks - https://arxiv.org/abs/2301.07608
00:16:15 - Open-Endedness for AGI (Hughes) - Importance of open-ended learning for AGI - https://arxiv.org/abs/2406.04268
00:16:35 - POET algorithm (Wang) - Open-ended approach to generate and solve challenges - https://arxiv.org/abs/1901.01753
00:17:20 - AlphaGo (Silver) - AI mastering the game of Go - https://deepmind.google/technologies/alphago/
00:20:35 - Adversarial Go attacks (Dennis) - Exploiting weaknesses in Go AI systems - https://www.ifaamas.org/Proceedings/aamas2024/pdfs/p1630.pdf
00:22:00 - Levels of AGI (Morris) - Framework for categorizing AGI progress - https://arxiv.org/abs/2311.02462
00:24:30 - Rainbow Teaming (Samvelyan) - LLM-based adversarial prompt generation - https://arxiv.org/abs/2402.16822
00:25:50 - Why Greatness Cannot Be Planned (Stanley) - 'False compass' and 'stepping stone collection' concepts - https://www.amazon.com/Why-Greatness-Cannot-Planned-Objective/dp/3319155237
00:27:45 - AI Debate (Khan) - Improving LLM truthfulness through debate - https://proceedings.mlr.press/v235/khan24a.html
00:29:40 - Gemini (Google DeepMind) - Advanced multimodal AI model - https://deepmind.google/technologies/gemini/
00:30:15 - How to Take Smart Notes (Ahrens) - Effective note-taking methodology - https://www.amazon.com/How-Take-Smart-Notes-Nonfiction/dp/1542866502
(truncated, see shownotes)
Ben Goertzel discusses AGI development, transhumanism, and the potential societal impacts of superintelligent AI. He predicts human-level AGI by 2029 and argues that the transition to superintelligence could happen within a few years after. Goertzel explores the challenges of AI regulation, the limitations of current language models, and the need for neuro-symbolic approaches in AGI research. He also addresses concerns about resource allocation and cultural perspectives on transhumanism.
TOC:
[00:00:00] AGI Timeline Predictions and Development Speed
[00:00:45] Limitations of Language Models in AGI Development
[00:02:18] Current State and Trends in AI Research and Development
[00:09:02] Emergent Reasoning Capabilities and Limitations of LLMs
[00:18:15] Neuro-Symbolic Approaches and the Future of AI Systems
[00:20:00] Evolutionary Algorithms and LLMs in Creative Tasks
[00:21:25] Symbolic vs. Sub-Symbolic Approaches in AI
[00:28:05] Language as Internal Thought and External Communication
[00:30:20] AGI Development and Goal-Directed Behavior
[00:35:51] Consciousness and AI: Expanding States of Experience
[00:48:50] AI Regulation: Challenges and Approaches
[00:55:35] Challenges in AI Regulation
[00:59:20] AI Alignment and Ethical Considerations
[01:09:15] AGI Development Timeline Predictions
[01:12:40] OpenCog Hyperon and AGI Progress
[01:17:48] Transhumanism and Resource Allocation Debate
[01:20:12] Cultural Perspectives on Transhumanism
[01:23:54] AGI and Post-Scarcity Society
[01:31:35] Challenges and Implications of AGI Development
New! PDF Show notes: https://www.dropbox.com/scl/fi/fyetzwgoaf70gpovyfc4x/BenGoertzel.pdf?rlkey=pze5dt9vgf01tf2wip32p5hk5&st=svbcofm3&dl=0
Refs:
00:00:15 Ray Kurzweil's AGI timeline prediction, Ray Kurzweil, https://en.wikipedia.org/wiki/Technological_singularity
00:01:45 Ben Goertzel: SingularityNET founder, Ben Goertzel, https://singularitynet.io/
00:02:35 AGI Conference series, AGI Conference Organizers, https://agi-conf.org/2024/
00:03:55 Ben Goertzel's contributions to AGI, Wikipedia contributors, https://en.wikipedia.org/wiki/Ben_Goertzel
00:11:05 Chain-of-Thought prompting, Subbarao Kambhampati, https://arxiv.org/abs/2405.04776
00:11:35 Algorithmic information content, Pieter Adriaans, https://plato.stanford.edu/entries/information-entropy/
00:12:10 Turing completeness in neural networks, Various contributors, https://plato.stanford.edu/entries/turing-machine/
00:16:15 AlphaGeometry: AI for geometry problems, Trieu, Li, et al., https://www.nature.com/articles/s41586-023-06747-5
00:18:25 Shane Legg and Ben Goertzel's collaboration, Shane Legg, https://en.wikipedia.org/wiki/Shane_Legg
00:20:00 Evolutionary algorithms in music generation, Yanxu Chen, https://arxiv.org/html/2409.03715v1
00:22:00 Peirce's theory of semiotics, Charles Sanders Peirce, https://plato.stanford.edu/entries/peirce-semiotics/
00:28:10 Chomsky's view on language, Noam Chomsky, https://chomsky.info/1983____/
00:34:05 Greg Egan's 'Diaspora', Greg Egan, https://www.amazon.co.uk/Diaspora-post-apocalyptic-thriller-perfect-MIRROR/dp/0575082097
00:40:35 'The Consciousness Explosion', Ben Goertzel & Gabriel Axel Montes, https://www.amazon.com/Consciousness-Explosion-Technological-Experiential-Singularity/dp/B0D8C7QYZD
00:41:55 Ray Kurzweil's books on singularity, Ray Kurzweil, https://www.amazon.com/Singularity-Near-Humans-Transcend-Biology/dp/0143037889
00:50:50 California AI regulation bills, California State Senate, https://sd18.senate.ca.gov/news/senate-unanimously-approves-senator-padillas-artificial-intelligence-package
00:56:40 Limitations of Compute Thresholds, Sara Hooker, https://arxiv.org/abs/2407.05694
00:56:55 'Taming Silicon Valley', Gary F. Marcus, https://www.penguinrandomhouse.com/books/768076/taming-silicon-valley-by-gary-f-marcus/
01:09:15 Kurzweil's AGI prediction update, Ray Kurzweil, https://www.theguardian.com/technology/article/2024/jun/29/ray-kurzweil-google-ai-the-singularity-is-nearer
AI expert Prof. Gary Marcus doesn't mince words about today's artificial intelligence. He argues that despite the buzz, chatbots like ChatGPT aren't as smart as they seem and could cause real problems if we're not careful.
Marcus is worried about tech companies putting profits before people. He thinks AI could make fake news and privacy issues even worse. He's also concerned that a few big tech companies have too much power. Looking ahead, Marcus believes the AI hype will die down as reality sets in. He wants to see AI developed in smarter, more responsible ways. His message to the public? We need to speak up and demand better AI before it's too late.
Buy Taming Silicon Valley:
https://amzn.to/3XTlC5s
Gary Marcus:
https://garymarcus.substack.com/
https://x.com/GaryMarcus
Interviewer:
Dr. Tim Scarfe
(Refs in top comment)
TOC
[00:00:00] AI Flaws, Improvements & Industry Critique
[00:16:29] AI Safety Theater & Image Generation Issues
[00:23:49] AI's Lack of World Models & Human-like Understanding
[00:31:09] LLMs: Superficial Intelligence vs. True Reasoning
[00:34:45] AI in Specialized Domains: Chess, Coding & Limitations
[00:42:10] AI-Generated Code: Capabilities & Human-AI Interaction
[00:48:10] AI Regulation: Industry Resistance & Oversight Challenges
[00:54:55] Copyright Issues in AI & Tech Business Models
[00:57:26] AI's Societal Impact: Risks, Misinformation & Ethics
[01:23:14] AI X-risk, Alignment & Moral Principles Implementation
[01:37:10] Persistent AI Flaws: System Limitations & Architecture Challenges
[01:44:33] AI Future: Surveillance Concerns, Economic Challenges & Neuro-Symbolic AI
YT version with refs: https://youtu.be/o9MfuUoGlSw
Prof. Mark Solms, a neuroscientist and psychoanalyst, discusses his groundbreaking work on consciousness, challenging conventional cortex-centric views and emphasizing the role of brainstem structures in generating consciousness and affect.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Key points discussed:
The limitations of vision-centric approaches to consciousness studies.
Evidence from decorticated animals and hydranencephalic children supporting the brainstem's role in consciousness.
The relationship between homeostasis, the free energy principle, and consciousness.
Critiques of behaviorism and modern theories of consciousness.
The importance of subjective experience in understanding brain function.
The discussion also explored broader topics:
The potential impact of affect-based theories on AI development.
The role of the SEEKING system in exploration and learning.
Connections between neuroscience, psychoanalysis, and philosophy of mind.
Challenges in studying consciousness and the limitations of current theories.
Mark Solms:
https://neuroscience.uct.ac.za/contacts/mark-solms
Show notes and transcript: https://www.dropbox.com/scl/fo/roipwmnlfmwk2e7kivzms/ACjZF-VIGC2-Suo30KcwVV0?rlkey=53y8v2cajfcgrf17p1h7v3suz&st=z8vu81hn&dl=0
TOC (*) are best bits
00:00:00 1. Intro: Challenging vision-centric approaches to consciousness *
00:02:20 2. Evidence from decorticated animals and hydranencephalic children *
00:07:40 3. Emotional responses in hydranencephalic children
00:10:40 4. Brainstem stimulation and affective states
00:15:00 5. Brainstem's role in generating affective consciousness *
00:21:50 6. Dual-aspect monism and the mind-brain relationship
00:29:37 7. Information, affect, and the hard problem of consciousness *
00:37:25 8. Wheeler's participatory universe and Chalmers' theories
00:48:51 9. Homeostasis, free energy principle, and consciousness *
00:59:25 10. Affect, voluntary behavior, and decision-making
01:05:45 11. Psychoactive substances, REM sleep, and consciousness research
01:12:14 12. Critiquing behaviorism and modern consciousness theories *
01:24:25 13. The SEEKING system and exploration in neuroscience
Refs:
1. Mark Solms' book "The Hidden Spring" [00:20:34] (MUST READ!)
https://amzn.to/3XyETb3
2. Karl Friston's free energy principle [00:03:50]
https://www.nature.com/articles/nrn2787
3. Hydranencephaly condition [00:07:10]
https://en.wikipedia.org/wiki/Hydranencephaly
4. Periaqueductal gray (PAG) [00:08:57]
https://en.wikipedia.org/wiki/Periaqueductal_gray
5. Positron Emission Tomography (PET) [00:13:52]
https://en.wikipedia.org/wiki/Positron_emission_tomography
6. Paul MacLean's triune brain theory [00:03:30]
https://en.wikipedia.org/wiki/Triune_brain
7. Baruch Spinoza's philosophy of mind [00:23:48]
https://plato.stanford.edu/entries/spinoza-epistemology-mind
8. Claude Shannon's "A Mathematical Theory of Communication" [00:32:15]
https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf
9. Francis Crick's "The Astonishing Hypothesis" [00:39:57]
https://en.wikipedia.org/wiki/The_Astonishing_Hypothesis
10. Frank Jackson's Knowledge Argument [00:40:54]
https://plato.stanford.edu/entries/qualia-knowledge/
11. Mesolimbic dopamine system [01:11:51]
https://en.wikipedia.org/wiki/Mesolimbic_pathway
12. Jaak Panksepp's SEEKING system [01:25:23]
https://en.wikipedia.org/wiki/Jaak_Panksepp#Affective_neuroscience
Dr. Patrick Lewis, who coined the term RAG (Retrieval Augmented Generation) and now works at Cohere, discusses the evolution of language models, RAG systems, and challenges in AI evaluation.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmented generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Key topics covered:
- Origins and evolution of Retrieval Augmented Generation (RAG)
- Challenges in evaluating RAG systems and language models
- Human-AI collaboration in research and knowledge work
- Word embeddings and the progression to modern language models
- Dense vs sparse retrieval methods in information retrieval
The discussion also explored broader implications and applications:
- Balancing faithfulness and fluency in RAG systems
- User interface design for AI-augmented research tools
- The journey from chemistry to AI research
- Challenges in enterprise search compared to web search
- The importance of data quality in training AI models
Patrick Lewis: https://www.patricklewis.io/
Cohere Command Models, check them out - they are amazing for RAG!
https://cohere.com/command
TOC
00:00:00 1. Intro to RAG
00:05:30 2. RAG Evaluation: Poll framework & model performance
00:12:55 3. Data Quality: Cleanliness vs scale in AI training
00:15:13 4. Human-AI Collaboration: Research agents & UI design
00:22:57 5. RAG Origins: Open-domain QA to generative models
00:30:18 6. RAG Challenges: Info retrieval, tool use, faithfulness
00:42:01 7. Dense vs Sparse Retrieval: Techniques & trade-offs
00:47:02 8. RAG Applications: Grounding, attribution, hallucination prevention
00:54:04 9. UI for RAG: Human-computer interaction & model optimization
00:59:01 10. Word Embeddings: Word2Vec, GloVe, and semantic spaces
01:06:43 11. Language Model Evolution: BERT, GPT, and beyond
01:11:38 12. AI & Human Cognition: Sequential processing & chain-of-thought
Refs:
1. Retrieval Augmented Generation (RAG) paper / Patrick Lewis et al. [00:27:45]
https://arxiv.org/abs/2005.11401
2. LAMA (LAnguage Model Analysis) probe / Petroni et al. [00:26:35]
https://arxiv.org/abs/1909.01066
3. KILT (Knowledge Intensive Language Tasks) benchmark / Petroni et al. [00:27:05]
https://arxiv.org/abs/2009.02252
4. Word2Vec algorithm / Tomas Mikolov et al. [01:00:25]
https://arxiv.org/abs/1301.3781
5. GloVe (Global Vectors for Word Representation) / Pennington et al. [01:04:35]
https://nlp.stanford.edu/projects/glove/
6. BERT (Bidirectional Encoder Representations from Transformers) / Devlin et al. [01:08:00]
https://arxiv.org/abs/1810.04805
7. 'The Language Game' book / Nick Chater and Morten H. Christiansen [01:11:40]
https://amzn.to/4grEUpG
Disclaimer: This is the sixth video from our Cohere partnership. We were not told what to say in the interview. Filmed in Seattle in June 2024.
Ashley Edwards, who was working at DeepMind when she co-authored the Genie paper and is now at Runway, covered several key aspects of the Genie AI system and its applications in video generation, robotics, and game creation.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Genie's approach to learning interactive environments, balancing compression and fidelity.
The use of latent action models and VQE models for video processing and tokenization.
Challenges in maintaining action consistency across frames and integrating text-to-image models.
Evaluation metrics for AI-generated content, such as FID and PS&R diff metrics.
The discussion also explored broader implications and applications:
The potential impact of AI video generation on content creation jobs.
Applications of Genie in game generation and robotics.
The use of foundation models in robotics and the differences between internet video data and specialized robotics data.
Challenges in mapping AI-generated actions to real-world robotic actions.
Ashley Edwards: https://ashedwards.github.io/
TOC (*) are best bits
00:00:00 1. Intro to Genie & Brave Search API: Trade-offs & limitations *
00:02:26 2. Genie's Architecture: Latent action, VQE, video processing *
00:05:06 3. Genie's Constraints: Frame consistency & image model integration
00:07:26 4. Evaluation: FID, PS&R diff metrics & latent induction methods
00:09:44 5. AI Video Gen: Content creation impact, depth & parallax effects
00:11:39 6. Model Scaling: Training data impact & computational trade-offs
00:13:50 7. Game & Robotics Apps: Gamification & action mapping challenges *
00:16:16 8. Robotics Foundation Models: Action space & data considerations *
00:19:18 9. Mask-GPT & Video Frames: Real-time optimization, RL from videos
00:20:34 10. Research Challenges: AI value, efficiency vs. quality, safety
00:24:20 11. Future Dev: Efficiency improvements & fine-tuning strategies
Refs:
1. Genie (learning interactive environments from videos) / Ashley and DM collegues [00:01]
https://arxiv.org/abs/2402.15391
2. VQ-VAE (Vector Quantized Variational Autoencoder) / Aaron van den Oord, Oriol Vinyals, Koray Kavukcuoglu [02:43]
https://arxiv.org/abs/1711.00937
3. FID (Fréchet Inception Distance) metric / Martin Heusel et al. [07:37]
https://arxiv.org/abs/1706.08500
4. PS&R (Precision and Recall) metric / Mehdi S. M. Sajjadi et al. [08:02]
https://arxiv.org/abs/1806.00035
5. Vision Transformer (ViT) architecture / Alexey Dosovitskiy et al. [12:14]
https://arxiv.org/abs/2010.11929
6. Genie (robotics foundation models) / Google DeepMind [17:34]
https://deepmind.google/research/publications/60474/
7. Chelsea Finn's lab work on robotics datasets / Chelsea Finn [17:38]
https://ai.stanford.edu/~cbfinn/
8. Imitation from observation in reinforcement learning / YuXuan Liu [20:58]
https://arxiv.org/abs/1707.03374
9. Waymo's autonomous driving technology / Waymo [22:38]
https://waymo.com/
10. Gen3 model release by Runway / Runway [23:48]
https://runwayml.com/
11. Classifier-free guidance technique / Jonathan Ho and Tim Salimans [24:43]
https://arxiv.org/abs/2207.12598
Saurabh Baji discusses Cohere's approach to developing and deploying large language models (LLMs) for enterprise use.
* Cohere focuses on pragmatic, efficient models tailored for business applications rather than pursuing the largest possible models.
* They offer flexible deployment options, from cloud services to on-premises installations, to meet diverse enterprise needs.
* Retrieval-augmented generation (RAG) is highlighted as a critical capability, allowing models to leverage enterprise data securely.
* Cohere emphasizes model customization, fine-tuning, and tools like reranking to optimize performance for specific use cases.
* The company has seen significant growth, transitioning from developer-focused to enterprise-oriented services.
* Major customers like Oracle, Fujitsu, and TD Bank are using Cohere's models across various applications, from HR to finance.
* Baji predicts a surge in enterprise AI adoption over the next 12-18 months as more companies move from experimentation to production.
* He emphasizes the importance of trust, security, and verifiability in enterprise AI applications.
The interview provides insights into Cohere's strategy, technology, and vision for the future of enterprise AI adoption.
https://www.linkedin.com/in/saurabhbaji/
https://x.com/sbaji
https://cohere.com/
https://cohere.com/business
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
TOC (*) are best bits
00:00:00 1. Introduction and Background
00:04:24 2. Cloud Infrastructure and LLM Optimization
00:06:43 2.1 Model deployment and fine-tuning strategies *
00:09:37 3. Enterprise AI Deployment Strategies
00:11:10 3.1 Retrieval-augmented generation in enterprise environments *
00:13:40 3.2 Standardization vs. customization in cloud services *
00:18:20 4. AI Model Evaluation and Deployment
00:18:20 4.1 Comprehensive evaluation frameworks *
00:21:20 4.2 Key components of AI model stacks *
00:25:50 5. Retrieval Augmented Generation (RAG) in Enterprise
00:32:10 5.1 Pragmatic approach to RAG implementation *
00:33:45 6. AI Agents and Tool Integration
00:33:45 6.1 Leveraging tools for AI insights *
00:35:30 6.2 Agent-based AI systems and diagnostics *
00:42:55 7. AI Transparency and Reasoning Capabilities
00:49:10 8. AI Model Training and Customization
00:57:10 9. Enterprise AI Model Management
01:02:10 9.1 Managing AI model versions for enterprise customers *
01:04:30 9.2 Future of language model programming *
01:06:10 10. AI-Driven Software Development
01:06:10 10.1 AI bridging human expression and task achievement *
01:08:00 10.2 AI-driven virtual app fabrics in enterprise *
01:13:33 11. Future of AI and Enterprise Applications
01:21:55 12. Cohere's Customers and Use Cases
01:21:55 12.1 Cohere's growth and enterprise partnerships *
01:27:14 12.2 Diverse customers using generative AI *
01:27:50 12.3 Industry adaptation to generative AI *
01:29:00 13. Technical Advantages of Cohere Models
01:29:00 13.1 Handling large context windows *
01:29:40 13.2 Low latency impact on developer productivity *
Disclaimer: This is the fifth video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview. Filmed in Seattle in Aug 2024.
David Hanson, CEO of Hanson Robotics and creator of the humanoid robot Sofia, explores the intersection of artificial intelligence, ethics, and human potential. In this thought-provoking interview, Hanson discusses his vision for developing AI systems that embody the best aspects of humanity while pushing beyond our current limitations, aiming to achieve what he calls "super wisdom."
YT version: https://youtu.be/LFCIEhlsozU
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
The interview with David Hanson covers:
The importance of incorporating biological drives and compassion into AI systems
Hanson's concept of "existential pattern ethics" as a basis for AI morality
The potential for AI to enhance human intelligence and wisdom
Challenges in developing artificial general intelligence (AGI)
The need to democratize AI technologies globally
Potential future advancements in human-AI integration and their societal impacts
Concerns about technological augmentation exacerbating inequality
The role of ethics in guiding AI development and deployment
Hanson advocates for creating AI systems that embody the best aspects of humanity while surpassing current human limitations, aiming for "super wisdom" rather than just artificial super intelligence.
David Hanson:
https://www.hansonrobotics.com/david-hanson/
https://www.youtube.com/watch?v=9u1O954cMmE
TOC
1. Introduction and Background [00:00:00]
1.1. David Hanson's interdisciplinary background [0:01:49]
1.2. Introduction to Sofia, the realistic robot [0:03:27]
2. Human Cognition and AI [0:03:50]
2.1. Importance of social interaction in cognition [0:03:50]
2.2. Compassion as distinguishing factor [0:05:55]
2.3. AI augmenting human intelligence [0:09:54]
3. Developing Human-like AI [0:13:17]
3.1. Incorporating biological drives in AI [0:13:17]
3.2. Creating AI with agency [0:20:34]
3.3. Implementing flexible desires in AI [0:23:23]
4. Ethics and Morality in AI [0:27:53]
4.1. Enhancing humanity through AI [0:27:53]
4.2. Existential pattern ethics [0:30:14]
4.3. Expanding morality beyond restrictions [0:35:35]
5. Societal Impact of AI [0:38:07]
5.1. AI adoption and integration [0:38:07]
5.2. Democratizing AI technologies [0:38:32]
5.3. Human-AI integration and identity [0:43:37]
6. Future Considerations [0:50:03]
6.1. Technological augmentation and inequality [0:50:03]
6.2. Emerging technologies for mental health [0:50:32]
6.3. Corporate ethics in AI development [0:52:26]
This was filmed at AGI-24
David Spivak, a mathematician known for his work in category theory, discusses a wide range of topics related to intelligence, creativity, and the nature of knowledge. He explains category theory in simple terms and explores how it relates to understanding complex systems and relationships.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
We discuss abstract concepts like collective intelligence, the importance of embodiment in understanding the world, and how we acquire and process knowledge. Spivak shares his thoughts on creativity, discussing where it comes from and how it might be modeled mathematically.
A significant portion of the discussion focuses on the impact of artificial intelligence on human thinking and its potential role in the evolution of intelligence. Spivak also touches on the importance of language, particularly written language, in transmitting knowledge and shaping our understanding of the world.
David Spivak
http://www.dspivak.net/
TOC:
00:00:00 Introduction to category theory and functors
00:04:40 Collective intelligence and sense-making
00:09:54 Embodiment and physical concepts in knowledge acquisition
00:16:23 Creativity, open-endedness, and AI's impact on thinking
00:25:46 Modeling creativity and the evolution of intelligence
00:36:04 Evolution, optimization, and the significance of AI
00:44:14 Written language and its impact on knowledge transmission
REFS:
Mike Levin's work
https://scholar.google.com/citations?user=luouyakAAAAJ&hl=en
Eric Smith's videos on complexity and early life
https://www.youtube.com/watch?v=SpJZw-68QyE
Richard Dawkins' book "The Selfish Gene"
https://amzn.to/3X73X8w
Carl Sagan's statement about the cosmos knowing itself
https://amzn.to/3XhPruK
Herbert Simon's concept of "satisficing"
https://plato.stanford.edu/entries/bounded-rationality/
DeepMind paper on open-ended systems
https://arxiv.org/abs/2406.04268
Karl Friston's work on active inference
https://direct.mit.edu/books/oa-monograph/5299/Active-InferenceThe-Free-Energy-Principle-in-Mind
MIT category theory lectures by David Spivak (available on the Topos Institute channel)
https://www.youtube.com/watch?v=UusLtx9fIjs
Jürgen Schmidhuber, the father of generative AI shares his groundbreaking work in deep learning and artificial intelligence. In this exclusive interview, he discusses the history of AI, some of his contributions to the field, and his vision for the future of intelligent machines. Schmidhuber offers unique insights into the exponential growth of technology and the potential impact of AI on humanity and the universe.
YT version: https://youtu.be/DP454c1K_vQ
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
TOC
00:00:00 Intro
00:03:38 Reasoning
00:13:09 Potential AI Breakthroughs Reducing Computation Needs
00:20:39 Memorization vs. Generalization in AI
00:25:19 Approach to the ARC Challenge
00:29:10 Perceptions of Chat GPT and AGI
00:58:45 Abstract Principles of Jurgen's Approach
01:04:17 Analogical Reasoning and Compression
01:05:48 Breakthroughs in 1991: the P, the G, and the T in ChatGPT and Generative AI
01:15:50 Use of LSTM in Language Models by Tech Giants
01:21:08 Neural Network Aspect Ratio Theory
01:26:53 Reinforcement Learning Without Explicit Teachers
Refs:
★ "Annotated History of Modern AI and Deep Learning" (2022 survey by Schmidhuber):
★ Chain Rule For Backward Credit Assignment (Leibniz, 1676)
★ First Neural Net / Linear Regression / Shallow Learning (Gauss & Legendre, circa 1800)
★ First 20th Century Pioneer of Practical AI (Quevedo, 1914)
★ First Recurrent NN (RNN) Architecture (Lenz, Ising, 1920-1925)
★ AI Theory: Fundamental Limitations of Computation and Computation-Based AI (Gödel, 1931-34)
★ Unpublished ideas about evolving RNNs (Turing, 1948)
★ Multilayer Feedforward NN Without Deep Learning (Rosenblatt, 1958)
★ First Published Learning RNNs (Amari and others, ~1972)
★ First Deep Learning (Ivakhnenko & Lapa, 1965)
★ Deep Learning by Stochastic Gradient Descent (Amari, 1967-68)
★ ReLUs (Fukushima, 1969)
★ Backpropagation (Linnainmaa, 1970); precursor (Kelley, 1960)
★ Backpropagation for NNs (Werbos, 1982)
★ First Deep Convolutional NN (Fukushima, 1979); later combined with Backprop (Waibel 1987, Zhang 1988).
★ Metalearning or Learning to Learn (Schmidhuber, 1987)
★ Generative Adversarial Networks / Artificial Curiosity / NN Online Planners (Schmidhuber, Feb 1990; see the G in Generative AI and ChatGPT)
★ NNs Learn to Generate Subgoals and Work on Command (Schmidhuber, April 1990)
★ NNs Learn to Program NNs: Unnormalized Linear Transformer (Schmidhuber, March 1991; see the T in ChatGPT)
★ Deep Learning by Self-Supervised Pre-Training. Distilling NNs (Schmidhuber, April 1991; see the P in ChatGPT)
★ Experiments with Pre-Training; Analysis of Vanishing/Exploding Gradients, Roots of Long Short-Term Memory / Highway Nets / ResNets (Hochreiter, June 1991, further developed 1999-2015 with other students of Schmidhuber)
★ LSTM journal paper (1997, most cited AI paper of the 20th century)
★ xLSTM (Hochreiter, 2024)
★ Reinforcement Learning Prompt Engineer for Abstract Reasoning and Planning (Schmidhuber 2015)
★ Mindstorms in Natural Language-Based Societies of Mind (2023 paper by Schmidhuber's team)
https://arxiv.org/abs/2305.17066
★ Bremermann's physical limit of computation (1982)
EXTERNAL LINKS
CogX 2018 - Professor Juergen Schmidhuber
https://www.youtube.com/watch?v=17shdT9-wuA
Discovering Neural Nets with Low Kolmogorov Complexity and High Generalization Capability (Neural Networks, 1997)
https://sferics.idsia.ch/pub/juergen/loconet.pdf
The paradox at the heart of mathematics: Gödel's Incompleteness Theorem - Marcus du Sautoy
https://www.youtube.com/watch?v=I4pQbo5MQOs
(Refs truncated, full version on YT VD)
Professor Pedro Domingos, is an AI researcher and professor of computer science. He expresses skepticism about current AI regulation efforts and argues for faster AI development rather than slowing it down. He also discusses the need for new innovations to fulfil the promises of current AI techniques.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmented generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Show notes:
* Domingos' views on AI regulation and why he believes it's misguided
* His thoughts on the current state of AI technology and its limitations
* Discussion of his novel "2040", a satirical take on AI and tech culture
* Explanation of his work on "tensor logic", which aims to unify neural networks and symbolic AI
* Critiques of other approaches in AI, including those of OpenAI and Gary Marcus
* Thoughts on the AI "bubble" and potential future developments in the field
Prof. Pedro Domingos:
https://x.com/pmddomingos
2040: A Silicon Valley Satire [Pedro's new book]
https://amzn.to/3T51ISd
TOC:
00:00:00 Intro
00:06:31 Bio
00:08:40 Filmmaking skit
00:10:35 AI and the wisdom of crowds
00:19:49 Social Media
00:27:48 Master algorithm
00:30:48 Neurosymbolic AI / abstraction
00:39:01 Language
00:45:38 Chomsky
01:00:49 2040 Book
01:18:03 Satire as a shield for criticism?
01:29:12 AI Regulation
01:35:15 Gary Marcus
01:52:37 Copyright
01:56:11 Stochastic parrots come home to roost
02:00:03 Privacy
02:01:55 LLM ecosystem
02:05:06 Tensor logic
Refs:
The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World [Pedro Domingos]
https://amzn.to/3MiWs9B
Rebooting AI: Building Artificial Intelligence We Can Trust [Gary Marcus]
https://amzn.to/3AAywvL
Flash Boys [Michael Lewis]
https://amzn.to/4dUGm1M
Andrew Ilyas, a PhD student at MIT who is about to start as a professor at CMU. We discuss Data modeling and understanding how datasets influence model predictions, Adversarial examples in machine learning and why they occur, Robustness in machine learning models, Black box attacks on machine learning systems, Biases in data collection and dataset creation, particularly in ImageNet and Self-selection bias in data and methods to address it.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api
Andrew's site:
https://andrewilyas.com/
https://x.com/andrew_ilyas
TOC:
00:00:00 - Introduction and Andrew's background
00:03:52 - Overview of the machine learning pipeline
00:06:31 - Data modeling paper discussion
00:26:28 - TRAK: Evolution of data modeling work
00:43:58 - Discussion on abstraction, reasoning, and neural networks
00:53:16 - "Adversarial Examples Are Not Bugs, They Are Features" paper
01:03:24 - Types of features learned by neural networks
01:10:51 - Black box attacks paper
01:15:39 - Work on data collection and bias
01:25:48 - Future research plans and closing thoughts
References:
Adversarial Examples Are Not Bugs, They Are Features
https://arxiv.org/pdf/1905.02175
TRAK: Attributing Model Behavior at Scale
https://arxiv.org/pdf/2303.14186
Datamodels: Predicting Predictions from Training Data
https://arxiv.org/pdf/2202.00622
Adversarial Examples Are Not Bugs, They Are Features
https://arxiv.org/pdf/1905.02175
IMAGENET-TRAINED CNNS
https://arxiv.org/pdf/1811.12231
ZOO: Zeroth Order Optimization Based Black-box
https://arxiv.org/pdf/1708.03999
A Spline Theory of Deep Networks
https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf
Scaling Monosemanticity
https://transformer-circuits.pub/2024/scaling-monosemanticity/
Adversarial Examples Are Not Bugs, They Are Features
https://gradientscience.org/adv/
Adversarial Robustness Limits via Scaling-Law and Human-Alignment Studies
https://proceedings.mlr.press/v235/bartoldson24a.html
Prior Convictions: Black-Box Adversarial Attacks with Bandits and Priors
https://arxiv.org/abs/1807.07978
Estimation of Standard Auction Models
https://arxiv.org/abs/2205.02060
From ImageNet to Image Classification: Contextualizing Progress on Benchmarks
https://arxiv.org/abs/2005.11295
Estimation of Standard Auction Models
https://arxiv.org/abs/2205.02060
What Makes A Good Fisherman? Linear Regression under Self-Selection Bias
https://arxiv.org/abs/2205.03246
Towards Tracing Factual Knowledge in Language Models Back to the
Training Data [Akyürek]
https://arxiv.org/pdf/2205.11482
Dr. Joscha Bach introduces a surprising idea called "cyber animism" in his AGI-24 talk - the notion that nature might be full of self-organizing software agents, similar to the spirits in ancient belief systems. Bach suggests that consciousness could be a kind of software running on our brains, and wonders if similar "programs" might exist in plants or even entire ecosystems.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Joscha takes us on a tour de force through history, philosophy, and cutting-edge computer science, teasing us to rethink what we know about minds, machines, and the world around us. Joscha believes we should blur the lines between human, artificial, and natural intelligence, and argues that consciousness might be more widespread and interconnected than we ever thought possible.
Dr. Joscha Bach
https://x.com/Plinz
This is video 2/9 from our coverage of AGI-24 in Seattle https://agi-conf.org/2024/
Watch the official MLST interview with Joscha which we did right after this talk on our Patreon now on early access - https://www.patreon.com/posts/joscha-bach-110199676 (you also get access to our private discord and biweekly calls)
TOC:
00:00:00 Introduction: AGI and Cyberanimism
00:03:57 The Nature of Consciousness
00:08:46 Aristotle's Concepts of Mind and Consciousness
00:13:23 The Hard Problem of Consciousness
00:16:17 Functional Definition of Consciousness
00:20:24 Comparing LLMs and Human Consciousness
00:26:52 Testing for Consciousness in AI Systems
00:30:00 Animism and Software Agents in Nature
00:37:02 Plant Consciousness and Ecosystem Intelligence
00:40:36 The California Institute for Machine Consciousness
00:44:52 Ethics of Conscious AI and Suffering
00:46:29 Philosophical Perspectives on Consciousness
00:49:55 Q&A: Formalisms for Conscious Systems
00:53:27 Coherence, Self-Organization, and Compute Resources
YT version (very high quality, filmed by us live)
https://youtu.be/34VOI_oo-qM
Refs:
Aristotle's work on the soul and consciousness
Richard Dawkins' work on genes and evolution
Gerald Edelman's concept of Neural Darwinism
Thomas Metzinger's book "Being No One"
Yoshua Bengio's concept of the "consciousness prior"
Stuart Hameroff's theories on microtubules and consciousness
Christof Koch's work on consciousness
Daniel Dennett's "Cartesian Theater" concept
Giulio Tononi's Integrated Information Theory
Mike Levin's work on organismal intelligence
The concept of animism in various cultures
Freud's model of the mind
Buddhist perspectives on consciousness and meditation
The Genesis creation narrative (for its metaphorical interpretation)
California Institute for Machine Consciousness
Prof Gary Marcus revisited his keynote from AGI-21, noting that many of the issues he highlighted then are still relevant today despite significant advances in AI.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Gary Marcus criticized current large language models (LLMs) and generative AI for their unreliability, tendency to hallucinate, and inability to truly understand concepts.
Marcus argued that the AI field is experiencing diminishing returns with current approaches, particularly the "scaling hypothesis" that simply adding more data and compute will lead to AGI.
He advocated for a hybrid approach to AI that combines deep learning with symbolic AI, emphasizing the need for systems with deeper conceptual understanding.
Marcus highlighted the importance of developing AI with innate understanding of concepts like space, time, and causality.
He expressed concern about the moral decline in Silicon Valley and the rush to deploy potentially harmful AI technologies without adequate safeguards.
Marcus predicted a possible upcoming "AI winter" due to inflated valuations, lack of profitability, and overhyped promises in the industry.
He stressed the need for better regulation of AI, including transparency in training data, full disclosure of testing, and independent auditing of AI systems.
Marcus proposed the creation of national and global AI agencies to oversee the development and deployment of AI technologies.
He concluded by emphasizing the importance of interdisciplinary collaboration, focusing on robust AI with deep understanding, and implementing smart, agile governance for AI and AGI.
YT Version (very high quality filmed)
https://youtu.be/91SK90SahHc
Pre-order Gary's new book here:
Taming Silicon Valley: How We Can Ensure That AI Works for Us
https://amzn.to/4fO46pY
Filmed at the AGI-24 conference:
https://agi-conf.org/2024/
TOC:
00:00:00 Introduction
00:02:34 Introduction by Ben G
00:05:17 Gary Marcus begins talk
00:07:38 Critiquing current state of AI
00:12:21 Lack of progress on key AI challenges
00:16:05 Continued reliability issues with AI
00:19:54 Economic challenges for AI industry
00:25:11 Need for hybrid AI approaches
00:29:58 Moral decline in Silicon Valley
00:34:59 Risks of current generative AI
00:40:43 Need for AI regulation and governance
00:49:21 Concluding thoughts
00:54:38 Q&A: Cycles of AI hype and winters
01:00:10 Predicting a potential AI winter
01:02:46 Discussion on interdisciplinary approach
01:05:46 Question on regulating AI
01:07:27 Ben G's perspective on AI winter
DeepMind Research Scientist / MIT scholar Dr. Timothy Nguyen discusses his recent paper on understanding transformers through n-gram statistics. Nguyen explains his approach to analyzing transformer behavior using a kind of "template matching" (N-grams), providing insights into how these models process and predict language.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Key points covered include:
A method for describing transformer predictions using n-gram statistics without relying on internal mechanisms.
The discovery of a technique to detect overfitting in large language models without using holdout sets.
Observations on curriculum learning, showing how transformers progress from simpler to more complex rules during training.
Discussion of distance measures used in the analysis, particularly the variational distance.
Exploration of model sizes, training dynamics, and their impact on the results.
We also touch on philosophical aspects of describing versus explaining AI behavior, and the challenges in understanding the abstractions formed by neural networks. Nguyen concludes by discussing potential future research directions, including attempts to convert descriptions of transformer behavior into explanations of internal mechanisms.
Timothy Nguyen's earned his B.S. and Ph.D. in mathematics from Caltech and MIT, respectively. He held positions as Research Assistant Professor at the Simons Center for Geometry and Physics (2011-2014) and Visiting Assistant Professor at Michigan State University (2014-2017). During this time, his research expanded into high-energy physics, focusing on mathematical problems in quantum field theory. His work notably provided a simplified and corrected formulation of perturbative path integrals.
Since 2017, Nguyen has been working in industry, applying his expertise to machine learning. He is currently at DeepMind, where he contributes to both fundamental research and practical applications of deep learning to solve real-world problems.
Refs:
The Cartesian Cafe
https://www.youtube.com/@TimothyNguyen
Understanding Transformers via N-Gram Statistics
https://www.researchgate.net/publication/382204056_Understanding_Transformers_via_N-Gram_Statistics
TOC
00:00:00 Timothy Nguyen's background
00:02:50 Paper overview: transformers and n-gram statistics
00:04:55 Template matching and hash table approach
00:08:55 Comparing templates to transformer predictions
00:12:01 Describing vs explaining transformer behavior
00:15:36 Detecting overfitting without holdout sets
00:22:47 Curriculum learning in training
00:26:32 Distance measures in analysis
00:28:58 Model sizes and training dynamics
00:30:39 Future research directions
00:32:06 Conclusion and future topics
Jay Alammar, renowned AI educator and researcher at Cohere, discusses the latest developments in large language models (LLMs) and their applications in industry. Jay shares his expertise on retrieval augmented generation (RAG), semantic search, and the future of AI architectures.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Cohere Command R model series: https://cohere.com/command
Jay Alamaar:
https://x.com/jayalammar
Buy Jay's new book here!
Hands-On Large Language Models: Language Understanding and Generation
https://amzn.to/4fzOUgh
TOC:
00:00:00 Introduction to Jay Alammar and AI Education
00:01:47 Cohere's Approach to RAG and AI Re-ranking
00:07:15 Implementing AI in Enterprise: Challenges and Solutions
00:09:26 Jay's Role at Cohere and the Importance of Learning in Public
00:15:16 The Evolution of AI in Industry: From Deep Learning to LLMs
00:26:12 Expert Advice for Newcomers in Machine Learning
00:32:39 The Power of Semantic Search and Embeddings in AI Systems
00:37:59 Jay Alammar's Journey as an AI Educator and Visualizer
00:43:36 Visual Learning in AI: Making Complex Concepts Accessible
00:47:38 Strategies for Keeping Up with Rapid AI Advancements
00:49:12 The Future of Transformer Models and AI Architectures
00:51:40 Evolution of the Transformer: From 2017 to Present
00:54:19 Preview of Jay's Upcoming Book on Large Language Models
Disclaimer: This is the fourth video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview. Note also that this combines several previously unpublished interviews from Jay into one, the earlier one at Tim's house was shot in Aug 2023, and the more recent one in Toronto in May 2024.
Refs:
The Illustrated Transformer
https://jalammar.github.io/illustrated-transformer/
Attention Is All You Need
https://arxiv.org/abs/1706.03762
The Unreasonable Effectiveness of Recurrent Neural Networks
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Neural Networks in 11 Lines of Code
https://iamtrask.github.io/2015/07/12/basic-python-network/
Understanding LSTM Networks (Chris Olah's blog post)
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
Luis Serrano's YouTube Channel
https://www.youtube.com/channel/UCgBncpylJ1kiVaPyP-PZauQ
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
https://arxiv.org/abs/1908.10084
GPT (Generative Pre-trained Transformer) models
https://jalammar.github.io/illustrated-gpt2/
https://openai.com/research/gpt-4
BERT (Bidirectional Encoder Representations from Transformers)
https://jalammar.github.io/illustrated-bert/
https://arxiv.org/abs/1810.04805
RoPE (Rotary Positional Encoding)
https://arxiv.org/abs/2104.09864 (Linked paper discussing rotary embeddings)
Grouped Query Attention
https://arxiv.org/pdf/2305.13245
RLHF (Reinforcement Learning from Human Feedback)
https://openai.com/research/learning-from-human-preferences
https://arxiv.org/abs/1706.03741
DPO (Direct Preference Optimization)
https://arxiv.org/abs/2305.18290
Daniel Cahn, co-founder of Slingshot AI, on the potential of AI in therapy. Why is anxiety and depression affecting a large population? To what extent are these real categories? Why is the mental health getting worse? How often do you want an AI to agree with you? What are the ethics of persuasive AI? You will discover all in this conversation.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
Daniel Cahn (who is also hiring ML engineers by the way!)
https://x.com/thecahnartist?lang=en
/ cahnd
https://thinkingmachinespodcast.com/
TOC:
00:00:00 Intro
00:01:56 Therapy effectiveness vs drugs and societal implications
00:04:02 Mental health categories: Iatrogenesis and social constructs
00:10:19 Psychiatric treatment models and cognitive perspectives
00:13:30 AI design and human-like interactions: Intentionality debates
00:20:04 AI in therapy: Ethics, anthropomorphism, and loneliness mitigation
00:28:13 Therapy efficacy: Neuroplasticity, suffering, and AI placebos
00:33:29 AI's impact on human agency and cognitive modeling
00:41:17 Social media's effects on brain structure and behavior
00:50:46 AI ethics: Altering values and free will considerations
01:00:00 Work value perception and personal identity formation
01:13:37 Free will, agency, and mutable personal identity in therapy
01:24:27 AI in healthcare: Challenges, ethics, and therapy improvements
01:53:25 AI development: Societal impacts and cultural implications
Full references on YT VD: https://www.youtube.com/watch?v=7hwX6OZyNC0 (and baked into mp3 metadata)
Prof. Subbarao Kambhampati argues that while LLMs are impressive and useful tools, especially for creative tasks, they have fundamental limitations in logical reasoning and cannot provide guarantees about the correctness of their outputs. He advocates for hybrid approaches that combine LLMs with external verification systems.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at http://brave.com/api.
TOC (sorry the ones baked into the MP3 were wrong apropos due to LLM hallucination!)
[00:00:00] Intro
[00:02:06] Bio
[00:03:02] LLMs are n-gram models on steroids
[00:07:26] Is natural language a formal language?
[00:08:34] Natural language is formal?
[00:11:01] Do LLMs reason?
[00:19:13] Definition of reasoning
[00:31:40] Creativity in reasoning
[00:50:27] Chollet's ARC challenge
[01:01:31] Can we reason without verification?
[01:10:00] LLMs cant solve some tasks
[01:19:07] LLM Modulo framework
[01:29:26] Future trends of architecture
[01:34:48] Future research directions
Youtube version: https://www.youtube.com/watch?v=y1WnHpedi2A
Refs: (we didn't have space for URLs here, check YT video description instead)
How seriously should governments take the threat of existential risk from AI, given the lack of consensus among researchers? On the one hand, existential risks (x-risks) are necessarily somewhat speculative: by the time there is concrete evidence, it may be too late. On the other hand, governments must prioritize — after all, they don’t worry too much about x-risk from alien invasions.
MLST is sponsored by Brave:
The Brave Search API covers over 20 billion webpages, built from scratch without Big Tech biases or the recent extortionate price hikes on search API access. Perfect for AI model training and retrieval augmentated generation. Try it now - get 2,000 free queries monthly at brave.com/api.
Sayash Kapoor is a computer science Ph.D. candidate at Princeton University's Center for Information Technology Policy. His research focuses on the societal impact of AI. Kapoor has previously worked on AI in both industry and academia, with experience at Facebook, Columbia University, and EPFL Switzerland. He is a recipient of a best paper award at ACM FAccT and an impact recognition award at ACM CSCW. Notably, Kapoor was included in TIME's inaugural list of the 100 most influential people in AI.
Sayash Kapoor
https://x.com/sayashk
https://www.cs.princeton.edu/~sayashk/
Arvind Narayanan (other half of the AI Snake Oil duo)
https://x.com/random_walker
AI existential risk probabilities are too unreliable to inform policy
https://www.aisnakeoil.com/p/ai-existential-risk-probabilities
Pre-order AI Snake Oil Book
https://amzn.to/4fq2HGb
AI Snake Oil blog
https://www.aisnakeoil.com/
AI Agents That Matter
https://arxiv.org/abs/2407.01502
Shortcut learning in deep neural networks
https://www.semanticscholar.org/paper/Shortcut-learning-in-deep-neural-networks-Geirhos-Jacobsen/1b04936c2599e59b120f743fbb30df2eed3fd782
77% Of Employees Report AI Has Increased Workloads And Hampered Productivity, Study Finds
https://www.forbes.com/sites/bryanrobinson/2024/07/23/employees-report-ai-increased-workload/
TOC:
00:00:00 Intro
00:01:57 How seriously should we take Xrisk threat?
00:02:55 Risk too unrealiable to inform policy
00:10:20 Overinflated risks
00:12:05 Perils of utility maximisation
00:13:55 Scaling vs airplane speeds
00:17:31 Shift to smaller models?
00:19:08 Commercial LLM ecosystem
00:22:10 Synthetic data
00:24:09 Is AI complexifying our jobs?
00:25:50 Does ChatGPT make us dumber or smarter?
00:26:55 Are AI Agents overhyped?
00:28:12 Simple vs complex baselines
00:30:00 Cost tradeoff in agent design
00:32:30 Model eval vs downastream perf
00:36:49 Shortcuts in metrics
00:40:09 Standardisation of agent evals
00:41:21 Humans in the loop
00:43:54 Levels of agent generality
00:47:25 ARC challenge
Sara Hooker is VP of Research at Cohere and leader of Cohere for AI. We discuss her recent paper critiquing the use of compute thresholds, measured in FLOPs (floating point operations), as an AI governance strategy.
We explore why this approach, recently adopted in both US and EU AI policies, may be problematic and oversimplified. Sara explains the limitations of using raw computational power as a measure of AI capability or risk, and discusses the complex relationship between compute, data, and model architecture.
Equally important, we go into Sara's work on "The AI Language Gap." This research highlights the challenges and inequalities in developing AI systems that work across multiple languages. Sara discusses how current AI models, predominantly trained on English and a handful of high-resource languages, fail to serve the linguistic diversity of our global population. We explore the technical, ethical, and societal implications of this gap, and discuss potential solutions for creating more inclusive and representative AI systems.
We broadly discuss the relationship between language, culture, and AI capabilities, as well as the ethical considerations in AI development and deployment.
YT Version: https://youtu.be/dBZp47999Ko
TOC:
[00:00:00] Intro
[00:02:12] FLOPS paper
[00:26:42] Hardware lottery
[00:30:22] The Language gap
[00:33:25] Safety
[00:38:31] Emergent
[00:41:23] Creativity
[00:43:40] Long tail
[00:44:26] LLMs and society
[00:45:36] Model bias
[00:48:51] Language and capabilities
[00:52:27] Ethical frameworks and RLHF
Sara Hooker
https://www.sarahooker.me/
https://www.linkedin.com/in/sararosehooker/
https://scholar.google.com/citations?user=2xy6h3sAAAAJ&hl=en
https://x.com/sarahookr
Interviewer: Tim Scarfe
Refs
The AI Language gap
https://cohere.com/research/papers/the-AI-language-gap.pdf
On the Limitations of Compute Thresholds as a Governance Strategy.
https://arxiv.org/pdf/2407.05694v1
The Multilingual Alignment Prism: Aligning Global and Local Preferences to Reduce Harm
https://arxiv.org/pdf/2406.18682
Cohere Aya
https://cohere.com/research/aya
RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs
https://arxiv.org/pdf/2407.02552
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs
https://arxiv.org/pdf/2402.14740
Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence
https://www.whitehouse.gov/briefing-room/presidential-actions/2023/10/30/executive-order-on-the-safe-secure-and-trustworthy-development-and-use-of-artificial-intelligence/
EU AI Act
https://www.europarl.europa.eu/doceo/document/TA-9-2024-0138_EN.pdf
The bitter lesson
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Neel Nanda interview
https://www.youtube.com/watch?v=_Ygf0GnlwmY
Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet
https://transformer-circuits.pub/2024/scaling-monosemanticity/
Chollet's ARC challenge
https://github.com/fchollet/ARC-AGI
Ryan Greenblatt on ARC
https://www.youtube.com/watch?v=z9j3wB1RRGA
Disclaimer: This is the third video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview.
Murray Shanahan is a professor of Cognitive Robotics at Imperial College London and a senior research scientist at DeepMind. He challenges our assumptions about AI consciousness and urges us to rethink how we talk about machine intelligence.
We explore the dangers of anthropomorphizing AI, the limitations of current language in describing AI capabilities, and the fascinating intersection of philosophy and artificial intelligence.
Show notes and full references: https://docs.google.com/document/d/1ICtBI574W-xGi8Z2ZtUNeKWiOiGZ_DRsp9EnyYAISws/edit?usp=sharing
Prof Murray Shanahan:
https://www.doc.ic.ac.uk/~mpsha/ (look at his selected publications)
https://scholar.google.co.uk/citations?user=00bnGpAAAAAJ&hl=en
https://en.wikipedia.org/wiki/Murray_Shanahan
https://x.com/mpshanahan
Interviewer: Dr. Tim Scarfe
Refs (links in the Google doc linked above):
Role play with large language models
Waluigi effect
"Conscious Exotica" - Paper by Murray Shanahan (2016)
"Simulators" - Article by Janis from LessWrong
"Embodiment and the Inner Life" - Book by Murray Shanahan (2010)
"The Technological Singularity" - Book by Murray Shanahan (2015)
"Simulacra as Conscious Exotica" - Paper by Murray Shanahan (newer paper of the original focussed on LLMs)
A recent paper by Anthropic on using autoencoders to find features in language models (referring to the "Scaling Monosemanticity" paper)
Work by Peter Godfrey-Smith on octopus consciousness
"Metaphors We Live By" - Book by George Lakoff (1980s)
Work by Aaron Sloman on the concept of "space of possible minds" (1984 article mentioned)
Wittgenstein's "Philosophical Investigations" (posthumously published)
Daniel Dennett's work on the "intentional stance"
Alan Turing's original paper on the Turing Test (1950)
Thomas Nagel's paper "What is it like to be a bat?" (1974)
John Searle's Chinese Room Argument (mentioned but not detailed)
Work by Richard Evans on tackling reasoning problems
Claude Shannon's quote on knowledge and control
"Are We Bodies or Souls?" - Book by Richard Swinburne
Reference to work by Ethan Perez and others at Anthropic on potential deceptive behavior in language models
Reference to a paper by Murray Shanahan and Antonia Creswell on the "selection inference framework"
Mention of work by Francois Chollet, particularly the ARC (Abstraction and Reasoning Corpus) challenge
Reference to Elizabeth Spelke's work on core knowledge in infants
Mention of Karl Friston's work on planning as inference (active inference)
The film "Ex Machina" - Murray Shanahan was the scientific advisor
"The Waluigi Effect"
Anthropic's constitutional AI approach
Loom system by Lara Reynolds and Kyle McDonald for visualizing conversation trees
DeepMind's AlphaGo (mentioned multiple times as an example)
Mention of the "Golden Gate Claude" experiment
Reference to an interview Tim Scarfe conducted with University of Toronto students about self-attention controllability theorem
Mention of an interview with Irina Rish
Reference to an interview Tim Scarfe conducted with Daniel Dennett
Reference to an interview with Maria Santa Caterina
Mention of an interview with Philip Goff
Nick Chater and Martin Christianson's book ("The Language Game: How Improvisation Created Language and Changed the World")
Peter Singer's work from 1975 on ascribing moral status to conscious beings
Demis Hassabis' discussion on the "ladder of creativity"
Reference to B.F. Skinner and behaviorism
In the coming decades, the technology that enables virtual and augmented reality will improve beyond recognition. Within a century, world-renowned philosopher David J. Chalmers predicts, we will have virtual worlds that are impossible to distinguish from non-virtual worlds. But is virtual reality just escapism?
In a highly original work of 'technophilosophy', Chalmers argues categorically, no: virtual reality is genuine reality. Virtual worlds are not second-class worlds. We can live a meaningful life in virtual reality - and increasingly, we will.
What is reality, anyway? How can we lead a good life? Is there a god? How do we know there's an external world - and how do we know we're not living in a computer simulation? In Reality+, Chalmers conducts a grand tour of philosophy, using cutting-edge technology to provide invigorating new answers to age-old questions.
David J. Chalmers is an Australian philosopher and cognitive scientist specializing in the areas of philosophy of mind and philosophy of language. He is Professor of Philosophy and Neural Science at New York University, as well as co-director of NYU's Center for Mind, Brain, and Consciousness. Chalmers is best known for his work on consciousness, including his formulation of the "hard problem of consciousness."
Reality+: Virtual Worlds and the Problems of Philosophy
https://amzn.to/3RYyGD2
https://consc.net/
https://x.com/davidchalmers42
00:00:00 Reality+ Intro
00:12:02 GPT conscious? 10/10
00:14:19 The consciousness processor thought experiment (11/10)
00:20:34 Intelligence and Consciousness entangled? 10/10
00:22:44 Karl Friston / Meta Problem 10/10
00:29:05 Knowledge argument / subjective experience (6/10)
00:32:34 Emergence 11/10 (best chapter)
00:42:45 Working with Douglas Hofstadter 10/10
00:46:14 Intelligence is analogy making? 10/10
00:50:47 Intelligence explosion 8/10
00:58:44 Hypercomputation 10/10
01:09:44 Who designed the designer? (7/10)
01:13:57 Experience machine (7/10)
Ryan Greenblatt from Redwood Research recently published "Getting 50% on ARC-AGI with GPT-4.0," where he used GPT4o to reach a state-of-the-art accuracy on Francois Chollet's ARC Challenge by generating many Python programs.
Sponsor:
Sign up to Kalshi here https://kalshi.onelink.me/1r91/mlst -- the first 500 traders who deposit $100 will get a free $20 credit! Important disclaimer - In case it's not obvious - this is basically gambling and a *high risk* activity - only trade what you can afford to lose.
We discuss:
- Ryan's unique approach to solving the ARC Challenge and achieving impressive results.
- The strengths and weaknesses of current AI models.
- How AI and humans differ in learning and reasoning.
- Combining various techniques to create smarter AI systems.
- The potential risks and future advancements in AI, including the idea of agentic AI.
https://x.com/RyanPGreenblatt
https://www.redwoodresearch.org/
Refs:
Getting 50% (SoTA) on ARC-AGI with GPT-4o [Ryan Greenblatt]
https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt
On the Measure of Intelligence [Chollet]
https://arxiv.org/abs/1911.01547
Connectionism and Cognitive Architecture: A Critical Analysis [Jerry A. Fodor and Zenon W. Pylyshyn]
https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/proseminars/Proseminar13/ConnectionistArchitecture.pdf
Software 2.0 [Andrej Karpathy]
https://karpathy.medium.com/software-2-0-a64152b37c35
Why Greatness Cannot Be Planned: The Myth of the Objective [Kenneth Stanley]
https://amzn.to/3Wfy2E0
Biographical account of Terence Tao’s mathematical development. [M.A.(KEN) CLEMENTS]
https://gwern.net/doc/iq/high/smpy/1984-clements.pdf
Model Evaluation and Threat Research (METR)
https://metr.org/
Why Tool AIs Want to Be Agent AIs
https://gwern.net/tool-ai
Simulators - Janus
https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators
AI Control: Improving Safety Despite Intentional Subversion
https://www.lesswrong.com/posts/d9FJHawgkiMSPjagR/ai-control-improving-safety-despite-intentional-subversion
https://arxiv.org/abs/2312.06942
What a Compute-Centric Framework Says About Takeoff Speeds
https://www.openphilanthropy.org/research/what-a-compute-centric-framework-says-about-takeoff-speeds/
Global GDP over the long run
https://ourworldindata.org/grapher/global-gdp-over-the-long-run?yScale=log
Safety Cases: How to Justify the Safety of Advanced AI Systems
https://arxiv.org/abs/2403.10462
The Danger of a “Safety Case"
http://sunnyday.mit.edu/The-Danger-of-a-Safety-Case.pdf
The Future Of Work Looks Like A UPS Truck (~02:15:50)
https://www.npr.org/sections/money/2014/05/02/308640135/episode-536-the-future-of-work-looks-like-a-ups-truck
SWE-bench
https://www.swebench.com/
Using DeepSpeed and Megatron to Train Megatron-Turing NLG
530B, A Large-Scale Generative Language Model
https://arxiv.org/pdf/2201.11990
Algorithmic Progress in Language Models
https://epochai.org/blog/algorithmic-progress-in-language-models
Aidan Gomez, CEO of Cohere, reveals how they're tackling AI hallucinations and improving reasoning abilities. He also explains why Cohere doesn't use any output from GPT-4 for training their models.
Aidan shares his personal insights into the world of AI and LLMs and Cohere's unique approach to solving real-world business problems, and how their models are set apart from the competition. Aidan reveals how they are making major strides in AI technology, discussing everything from last mile customer engineering to the robustness of prompts and future architectures.
He also touches on the broader implications of AI for society, including potential risks and the role of regulation. He discusses Cohere's guiding principles and the health the of startup scene. With a particular focus on enterprise applications. Aidan provides a rare look into the internal workings of Cohere and their vision for driving productivity and innovation.
https://cohere.com/
https://x.com/aidangomez
Check out Cohere's amazing new Command R* models here
https://cohere.com/command
Disclaimer: This is the second video from our Cohere partnership. We were not told what to say in the interview, and didn't edit anything out from the interview.
The ARC Challenge, created by Francois Chollet, tests how well AI systems can generalize from a few examples in a grid-based intelligence test. We interview the current winners of the ARC Challenge—Jack Cole, Mohammed Osman and their collaborator Michael Hodel. They discuss how they tackled ARC (Abstraction and Reasoning Corpus) using language models. We also discuss the new "50%" public set approach announced today from Redwood Research (Ryan Greenblatt). Jack and Mohammed explain their winning approach, which involves fine-tuning a language model on a large, specifically-generated dataset and then doing additional fine-tuning at test-time, a technique known in this context as "active inference". They use various strategies to represent the data for the language model and believe that with further improvements, the accuracy could reach above 50%. Michael talks about his work on generating new ARC-like tasks to help train the models. They also debate whether their methods stay true to the "spirit" of Chollet's measure of intelligence. Despite some concerns, they agree that their solutions are promising and adaptable for other similar problems. Note: Jack's team is still the current official winner at 33% on the private set. Ryan's entry is not on the private leaderboard or eligible. Chollet invented ARC in 2019 (not 2017 as stated) "Ryan's entry is not a new state of the art. We don't know exactly how well it does since it was only evaluated on 100 tasks from the evaluation set and does 50% on those, reportedly. Meanwhile Jacks team i.e. MindsAI's solution does 54% on the entire eval set and it is seemingly possible to do 60-70% with an ensemble" Jack Cole: https://x.com/Jcole75Cole https://lab42.global/community-interview-jack-cole/ Mohamed Osman: Mohamed is looking to do a PhD in AI/ML, can you help him? Email: [email protected] https://www.linkedin.com/in/mohamedosman1905/ Michael Hodel: https://arxiv.org/pdf/2404.07353v1 https://www.linkedin.com/in/michael-hodel/ https://x.com/bayesilicon https://github.com/michaelhodel Getting 50% (SoTA) on ARC-AGI with GPT-4o - Ryan Greenblatt https://redwoodresearch.substack.com/p/getting-50-sota-on-arc-agi-with-gpt Neural networks for abstraction and reasoning: Towards broad generalization in machines [Mikel Bober-Irizar, Soumya Banerjee] https://arxiv.org/pdf/2402.03507 Measure of intelligence: https://arxiv.org/abs/1911.01547 YT version: https://youtu.be/jSAT_RuJ_Cg
Nick Frosst, co-founder of Cohere, on the future of LLMs, and AGI. Learn how Cohere is solving real problems for business with their new AI models.
This is the first podcast from our new Cohere partnership!
Nick talks about his journey at Google Brain, working with AI legends like Geoff Hinton, and the amazing things his company, Cohere, is doing. From creating the must useful language models for businesses to making tools for developers, Nick shares a lot of interesting insights. He even talks about his band, Good Kid! Nick said that RAG is one of the best features of Cohere's new Command R* models. We are about to release a deep-dive on RAG with Patrick Lewis from Cohere, keep an eye out for that - he explains why their models are specifically optimised for RAG use cases.
Learn more about Cohere Command R* models here:
https://cohere.com/commandhttps://github.com/cohere-ai/cohere-toolkit
Nick's band Good Kid:
https://goodkidofficial.com/
Nick on Twitter:
https://x.com/nickfrosst
Disclaimer: We are in a partnership with Cohere to release content for them. We were not told what to say in the interview, and didn't edit anything out from the interview. We are currently planning to release 2 shows per month under the partnership about their AI platform, research and strategy.
These two scientists have mapped out the insides or “reachable space” of a language model using control theory, what they discovered was extremely surprising.
Please support us on Patreon to get access to the private Discord server, bi-weekly calls, early access and ad-free listening.
https://patreon.com/mlst
YT version: https://youtu.be/Bpgloy1dDn0
Aman Bhargava from Caltech and Cameron Witkowski from the University of Toronto to discuss their groundbreaking paper, “What’s the Magic Word? A Control Theory of LLM Prompting.” (the main theorem on self-attention controllability was developed in collaboration with Dr. Shi-Zhuo Looi from Caltech).
They frame LLM systems as discrete stochastic dynamical systems. This means they look at LLMs in a structured way, similar to how we analyze control systems in engineering. They explore the “reachable set” of outputs for an LLM. Essentially, this is the range of possible outputs the model can generate from a given starting point when influenced by different prompts. The research highlights that prompt engineering, or optimizing the input tokens, can significantly influence LLM outputs. They show that even short prompts can drastically alter the likelihood of specific outputs. Aman and Cameron’s work might be a boon for understanding and improving LLMs. They suggest that a deeper exploration of control theory concepts could lead to more reliable and capable language models.
We dropped an additional, more technical video on the research on our Twitter account here: https://x.com/MLStreetTalk/status/1795093759471890606
Additional 20 minutes of unreleased footage on our Patreon here: https://www.patreon.com/posts/whats-magic-word-104922629
What's the Magic Word? A Control Theory of LLM Prompting (Aman Bhargava, Cameron Witkowski, Manav Shah, Matt Thomson)
https://arxiv.org/abs/2310.04444
LLM Control Theory Seminar (April 2024)
https://www.youtube.com/watch?v=9QtS9sVBFM0
Society for the pursuit of AGI (Cameron founded it)
https://agisociety.mydurable.com/
Roger Federer demo
http://conway.languagegame.io/inference
Neural Cellular Automata, Active Inference, and the Mystery of Biological Computation (Aman)
https://aman-bhargava.com/ai/neuro/neuromorphic/2024/03/25/nca-do-active-inference.html
Aman and Cameron also want to thank Dr. Shi-Zhuo Looi and Prof. Matt Thomson from from Caltech for help and advice on their research. (https://thomsonlab.caltech.edu/ and https://pma.caltech.edu/people/looi-shi-zhuo)
https://x.com/ABhargava2000
https://x.com/witkowski_cam
Maria Santacaterina, with her background in the humanities, brings a critical perspective on the current state and future implications of AI technology, its impact on society, and the nature of human intelligence and creativity. She emphasizes that despite technological advancements, AI lacks fundamental human traits such as consciousness, empathy, intuition, and the ability to engage in genuine creative processes. Maria argues that AI, at its core, processes data but does not have the capability to understand or generate new, intrinsic meaning or ideas as humans do.
Throughout the conversation, Maria highlights her concern about the overreliance on AI in critical sectors such as healthcare, the justice system, and business. She stresses that while AI can serve as a tool, it should not replace human judgment and decision-making. Maria points out that AI systems often operate on past data, which may lead to outdated or incorrect decisions if not carefully managed.
The discussion also touches upon the concept of "adaptive resilience", which Maria describes in her book. She explains adaptive resilience as the capacity for individuals and enterprises to evolve and thrive amidst challenges by leveraging technology responsibly, without undermining human values and capabilities.
A significant portion of the conversation focussed on ethical considerations surrounding AI. Tim and Maria agree that there's a pressing need for strong governance and ethical frameworks to guide AI development and deployment. They discuss how AI, without proper ethical considerations, risks exacerbating issues like privacy invasion, misinformation, and unintended discrimination.
Maria is skeptical about claims of achieving Artificial General Intelligence (AGI) or a technological singularity where machines surpass human intelligence in all aspects. She argues that such scenarios neglect the complex, dynamic nature of human intelligence and consciousness, which cannot be fully replicated or replaced by machines.
Tim and Maria discuss the importance of keeping human agency and creativity at the forefront of technology development. Maria asserts that efforts to automate or standardize complex human actions and decisions are misguided and could lead to dehumanizing outcomes. They both advocate for using AI as an aid to enhance human capabilities rather than a substitute.
In closing, Maria encourages a balanced approach to AI adoption, urging stakeholders to prioritize human well-being, ethical standards, and societal benefit above mere technological advancement. The conversation ends with Maria pointing people to her book for more in-depth analysis and thoughts on the future interaction between humans and technology.
Buy Maria's book here: https://amzn.to/4avF6kq
https://www.linkedin.com/in/mariasantacaterina
TOC
00:00:00 - Intro to Book
00:03:23 - What Life Is
00:10:10 - Agency
00:18:04 - Tech and Society
00:21:51 - System 1 and 2
00:22:59 - We Are Being Pigeonholed
00:30:22 - Agency vs Autonomy
00:36:37 - Explanations
00:40:24 - AI Reductionism
00:49:50 - How Are Humans Intelligent
01:00:22 - Semantics
01:01:53 - Emotive AI and Pavlovian Dogs
01:04:05 - Technology, Social Media and Organisation
01:18:34 - Systems Are Not That Automated
01:19:33 - Hiring
01:22:34 - Subjectivity in Orgs
01:32:28 - The AGI Delusion
01:45:37 - GPT-laziness Syndrome
01:54:58 - Diversity Preservation
01:58:24 - Ethics
02:11:43 - Moral Realism
02:16:17 - Utopia
02:18:02 - Reciprocity
02:20:52 - Tyranny of Categorisation
Thomas Parr and his collaborators wrote a book titled "Active Inference: The Free Energy Principle in Mind, Brain and Behavior" which introduces Active Inference from both a high-level conceptual perspective and a low-level mechanistic, mathematical perspective.
Active inference, developed by the legendary neuroscientist Prof. Karl Friston - is a unifying mathematical framework which frames living systems as agents which minimize surprise and free energy in order to resist entropy and persist over time. It unifies various perspectives from physics, biology, statistics, and psychology - and allows us to explore deep questions about agency, biology, causality, modelling, and consciousness.
Buy Active Inference: The Free Energy Principle in Mind, Brain, and Behavior
https://amzn.to/4dj0iMj
YT version: https://youtu.be/lbb-Si5wa_o
Please support us on Patreon to get access to the private Discord server, bi-weekly calls, early access and ad-free listening.
https://patreon.com/mlst
Chapters should be embedded in the mp3, let me me know if issues
Connor is the CEO of Conjecture and one of the most famous names in the AI alignment movement. This is the "behind the scenes footage" and bonus Patreon interviews from the day of the Beff Jezos debate, including an interview with Daniel Clothiaux. It's a great insight into Connor's philosophy. At the end there is an unreleased additional interview with Beff.
Support MLST:
Please support us on Patreon. We are entirely funded from Patreon donations right now. Patreon supports get private discord access, biweekly calls, very early-access + exclusive content and lots more.
https://patreon.com/mlst
Donate: https://www.paypal.com/donate/?hosted_button_id=K2TYRVPBGXVNA
If you would like to sponsor us, so we can tell your story - reach out on mlstreettalk at gmail
Topics:
Externalized cognition and the role of society and culture in human intelligence
The potential for AI systems to develop agency and autonomy
The future of AGI as a complex mixture of various components
The concept of agency and its relationship to power
The importance of coherence in AI systems
The balance between coherence and variance in exploring potential upsides
The role of dynamic, competent, and incorruptible institutions in handling risks and developing technology
Concerns about AI widening the gap between the haves and have-nots
The concept of equal access to opportunity and maintaining dynamism in the system
Leahy's perspective on life as a process that "rides entropy"
The importance of distinguishing between epistemological, decision-theoretic, and aesthetic aspects of morality (inc ref to Hume's Guillotine)
The concept of continuous agency and the idea that the first AGI will be a messy admixture of various components
The potential for AI systems to become more physically embedded in the future
The challenges of aligning AI systems and the societal impacts of AI technologies like ChatGPT and Bing
The importance of humility in the face of complexity when considering the future of AI and its societal implications
Disclaimer: this video is not an endorsement of e/acc or AGI agential existential risk from us - the hosts of MLST consider both of these views to be quite extreme. We seek diverse views on the channel.
00:00:00 Intro
00:00:56 Connor's Philosophy
00:03:53 Office Skit
00:05:08 Connor on e/acc and Beff
00:07:28 Intro to Daniel's Philosophy
00:08:35 Connor on Entropy, Life, and Morality
00:19:10 Connor on London
00:20:21 Connor Office Interview
00:20:46 Friston Patreon Preview
00:21:48 Why Are We So Dumb?
00:23:52 The Voice of the People, the Voice of God / Populism
00:26:35 Mimetics
00:30:03 Governance
00:33:19 Agency
00:40:25 Daniel Interview - Externalised Cognition, Bing GPT, AGI
00:56:29 Beff + Connor Bonus Patreons Interview
Professor Chris Bishop is a Technical Fellow and Director at Microsoft Research AI4Science, in Cambridge. He is also Honorary Professor of Computer Science at the University of Edinburgh, and a Fellow of Darwin College, Cambridge. In 2004, he was elected Fellow of the Royal Academy of Engineering, in 2007 he was elected Fellow of the Royal Society of Edinburgh, and in 2017 he was elected Fellow of the Royal Society. Chris was a founding member of the UK AI Council, and in 2019 he was appointed to the Prime Minister’s Council for Science and Technology.
At Microsoft Research, Chris oversees a global portfolio of industrial research and development, with a strong focus on machine learning and the natural sciences.
Chris obtained a BA in Physics from Oxford, and a PhD in Theoretical Physics from the University of Edinburgh, with a thesis on quantum field theory.
Chris's contributions to the field of machine learning have been truly remarkable. He has authored (what is arguably) the original textbook in the field - 'Pattern Recognition and Machine Learning' (PRML) which has served as an essential reference for countless students and researchers around the world, and that was his second textbook after his highly acclaimed first textbook Neural Networks for Pattern Recognition.
Recently, Chris has co-authored a new book with his son, Hugh, titled 'Deep Learning: Foundations and Concepts.' This book aims to provide a comprehensive understanding of the key ideas and techniques underpinning the rapidly evolving field of deep learning. It covers both the foundational concepts and the latest advances, making it an invaluable resource for newcomers and experienced practitioners alike.
Buy Chris' textbook here:
https://amzn.to/3vvLcCh
More about Prof. Chris Bishop:
https://en.wikipedia.org/wiki/Christopher_Bishop
https://www.microsoft.com/en-us/research/people/cmbishop/
Support MLST:
Please support us on Patreon. We are entirely funded from Patreon donations right now. Patreon supports get private discord access, biweekly calls, early-access + exclusive content and lots more.
https://patreon.com/mlst
Donate: https://www.paypal.com/donate/?hosted_button_id=K2TYRVPBGXVNA
If you would like to sponsor us, so we can tell your story - reach out on mlstreettalk at gmail
TOC:
00:00:00 - Intro to Chris
00:06:54 - Changing Landscape of AI
00:08:16 - Symbolism
00:09:32 - PRML
00:11:02 - Bayesian Approach
00:14:49 - Are NNs One Model or Many, Special vs General
00:20:04 - Can Language Models Be Creative
00:22:35 - Sparks of AGI
00:25:52 - Creativity Gap in LLMs
00:35:40 - New Deep Learning Book
00:39:01 - Favourite Chapters
00:44:11 - Probability Theory
00:45:42 - AI4Science
00:48:31 - Inductive Priors
00:58:52 - Drug Discovery
01:05:19 - Foundational Bias Models
01:07:46 - How Fundamental Is Our Physics Knowledge?
01:12:05 - Transformers
01:12:59 - Why Does Deep Learning Work?
01:16:59 - Inscrutability of NNs
01:18:01 - Example of Simulator
01:21:09 - Control
Dr. Philip Ball is a freelance science writer. He just wrote a book called "How Life Works", discussing the how the science of Biology has advanced in the last 20 years. We focus on the concept of Agency in particular.
He trained as a chemist at the University of Oxford, and as a physicist at the University of Bristol. He worked previously at Nature for over 20 years, first as an editor for physical sciences and then as a consultant editor. His writings on science for the popular press have covered topical issues ranging from cosmology to the future of molecular biology.
YT: https://www.youtube.com/watch?v=n6nxUiqiz9I
Transcript link on YT description
Philip is the author of many popular books on science, including H2O: A Biography of Water, Bright Earth: The Invention of Colour, The Music Instinct and Curiosity: How Science Became Interested in Everything. His book Critical Mass won the 2005 Aventis Prize for Science Books, while Serving the Reich was shortlisted for the Royal Society Winton Science Book Prize in 2014.
This is one of Tim's personal favourite MLST shows, so we have designated it a special edition. Enjoy!
Buy Philip's book "How Life Works" here: https://amzn.to/3vSmNqp
Support MLST: Please support us on Patreon. We are entirely funded from Patreon donations right now. Patreon supports get private discord access, biweekly calls, early-access + exclusive content and lots more. https://patreon.com/mlst Donate: https://www.paypal.com/donate/?hosted... If you would like to sponsor us, so we can tell your story - reach out on mlstreettalk at gmail
Dr. Paul Lessard and his collaborators have written a paper on "Categorical Deep Learning and Algebraic Theory of Architectures". They aim to make neural networks more interpretable, composable and amenable to formal reasoning. The key is mathematical abstraction, as exemplified by category theory - using monads to develop a more principled, algebraic approach to structuring neural networks.
We also discussed the limitations of current neural network architectures in terms of their ability to generalise and reason in a human-like way. In particular, the inability of neural networks to do unbounded computation equivalent to a Turing machine. Paul expressed optimism that this is not a fundamental limitation, but an artefact of current architectures and training procedures.
The power of abstraction - allowing us to focus on the essential structure while ignoring extraneous details. This can make certain problems more tractable to reason about. Paul sees category theory as providing a powerful "Lego set" for productively thinking about many practical problems.
Towards the end, Paul gave an accessible introduction to some core concepts in category theory like categories, morphisms, functors, monads etc. We explained how these abstract constructs can capture essential patterns that arise across different domains of mathematics.
Paul is optimistic about the potential of category theory and related mathematical abstractions to put AI and neural networks on a more robust conceptual foundation to enable interpretability and reasoning. However, significant theoretical and engineering challenges remain in realising this vision.
Please support us on Patreon. We are entirely funded from Patreon donations right now.
https://patreon.com/mlst
If you would like to sponsor us, so we can tell your story - reach out on mlstreettalk at gmail
Links:
Categorical Deep Learning: An Algebraic Theory of Architectures
Bruno Gavranović, Paul Lessard, Andrew Dudzik,
Tamara von Glehn, João G. M. Araújo, Petar Veličković
Paper: https://categoricaldeeplearning.com/
Symbolica:
https://twitter.com/symbolica
https://www.symbolica.ai/
Dr. Paul Lessard (Principal Scientist - Symbolica)
https://www.linkedin.com/in/paul-roy-lessard/
Interviewer: Dr. Tim Scarfe
TOC:
00:00:00 - Intro
00:05:07 - What is the category paper all about
00:07:19 - Composition
00:10:42 - Abstract Algebra
00:23:01 - DSLs for machine learning
00:24:10 - Inscrutibility
00:29:04 - Limitations with current NNs
00:30:41 - Generative code / NNs don't recurse
00:34:34 - NNs are not Turing machines (special edition)
00:53:09 - Abstraction
00:55:11 - Category theory objects
00:58:06 - Cat theory vs number theory
00:59:43 - Data and Code are one in the same
01:08:05 - Syntax and semantics
01:14:32 - Category DL elevator pitch
01:17:05 - Abstraction again
01:20:25 - Lego set for the universe
01:23:04 - Reasoning
01:28:05 - Category theory 101
01:37:42 - Monads
01:45:59 - Where to learn more cat theory
Dr. Minqi Jiang and Dr. Marc Rigter explain an innovative new method to make the intelligence of agents more general-purpose by training them to learn many worlds before their usual goal-directed training, which we call "reinforcement learning". Their new paper is called "Reward-free curricula for training robust world models" https://arxiv.org/pdf/2306.09205.pdf https://twitter.com/MinqiJiang https://twitter.com/MarcRigter Interviewer: Dr. Tim Scarfe Please support us on Patreon, Tim is now doing MLST full-time and taking a massive financial hit. If you love MLST and want this to continue, please show your support! In return you get access to shows very early and private discord and networking. https://patreon.com/mlst We are also looking for show sponsors, please get in touch if interested mlstreettalk at gmail. MLST Discord: https://discord.gg/machine-learning-street-talk-mlst-937356144060530778
Nick Chater is Professor of Behavioural Science at Warwick Business School, who works on rationality and language using a range of theoretical and experimental approaches. We discuss his books The Mind is Flat, and the Language Game.
Please support me on Patreon (this is now my main job!) - https://patreon.com/mlst - Access the private Discord, networking, and early access to content.
MLST Discord: https://discord.gg/machine-learning-street-talk-mlst-937356144060530778
https://twitter.com/MLStreetTalk
Buy The Language Game:
https://amzn.to/3SRHjPm
Buy The Mind is Flat:
https://amzn.to/3P3BUUC
YT version: https://youtu.be/5cBS6COzLN4
https://www.wbs.ac.uk/about/person/nick-chater/
https://twitter.com/nickjchater?lang=en
See what Sam Altman advised Kenneth when he left OpenAI! Professor Kenneth Stanley has just launched a brand new type of social network, which he calls a "Serendipity network". The idea is that you follow interests, NOT people. It's a social network without the popularity contest. We discuss the phgilosophy and technology behind the venture in great detail. The main ideas of which came from Kenneth's famous book "Why greatness cannot be planned".
See what Sam Altman advised Kenneth when he left OpenAI! Professor Kenneth Stanley has just launched a brand new type of social network, which he calls a "Serendipity network".The idea is that you follow interests, NOT people. It's a social network without the popularity contest.
YT version: https://www.youtube.com/watch?v=pWIrXN-yy8g
Chapters should be baked into the MP3 file now
MLST public Discord: https://discord.gg/machine-learning-street-talk-mlst-937356144060530778 Please support our work on Patreon - get access to interviews months early, private Patreon, networking, exclusive content and regular calls with Tim and Keith. https://patreon.com/mlst Get Maven here: https://www.heymaven.com/ Kenneth: https://twitter.com/kenneth0stanley https://www.kenstanley.net/home Host - Tim Scarfe: https://www.linkedin.com/in/ecsquizor/ https://www.mlst.ai/ Original MLST show with Kenneth: https://www.youtube.com/watch?v=lhYGXYeMq_E
Tim explains the book more here:
https://www.youtube.com/watch?v=wNhaz81OOqw
Brandon Rohrer who obtained his Ph.D from MIT is driven by understanding algorithms ALL the way down to their nuts and bolts, so he can make them accessible to everyone by first explaining them in the way HE himself would have wanted to learn!
Please support us on Patreon for loads of exclusive content and private Discord:
https://patreon.com/mlst (public discord)
https://discord.gg/aNPkGUQtc5
https://twitter.com/MLStreetTalk
Brandon Rohrer is a seasoned data science leader and educator with a rich background in creating robust, efficient machine learning algorithms and tools. With a Ph.D. in Mechanical Engineering from MIT, his expertise encompasses a broad spectrum of AI applications — from computer vision and natural language processing to reinforcement learning and robotics. Brandon's career has seen him in Principle-level roles at Microsoft and Facebook. An educator at heart, he also shares his knowledge through detailed tutorials, courses, and his forthcoming book, "How to Train Your Robot."
YT version: https://www.youtube.com/watch?v=4Ps7ahonRCY
Brandon's links:
https://github.com/brohrer
https://www.youtube.com/channel/UCsBKTrp45lTfHa_p49I2AEQ
https://www.linkedin.com/in/brohrer/
How transformers work:
https://e2eml.school/transformers
Brandon's End-to-End Machine Learning school courses, posts, and tutorials
https://e2eml.school
Free course:
https://end-to-end-machine-learning.teachable.com/p/complete-course-library-full-end-to-end-machine-learning-catalog
Blog: https://e2eml.school/blog.html
Ziptie: Learning Useful Features [Brandon Rohrer]
https://www.brandonrohrer.com/ziptie
TOC should be baked into the MP3 file now
00:00:00 - Intro to Brandon
00:00:36 - RLHF
00:01:09 - Limitations of transformers
00:07:23 - Agency - we are all GPTs
00:09:07 - BPE / representation bias
00:12:00 - LLM true believers
00:16:42 - Brandon's style of teaching
00:19:50 - ML vs real world = Robotics
00:29:59 - Reward shaping
00:37:08 - No true Scotsman - when do we accept capabilities as real
00:38:50 - Externalism
00:43:03 - Building flexible robots
00:45:37 - Is reward enough
00:54:30 - Optimization curse
00:58:15 - Collective intelligence
01:01:51 - Intelligence + creativity
01:13:35 - ChatGPT + Creativity
01:25:19 - Transformers Tutorial
The world's second-most famous AI doomer Connor Leahy sits down with Beff Jezos, the founder of the e/acc movement debating technology, AI policy, and human values. As the two discuss technology, AI safety, civilization advancement, and the future of institutions, they clash on their opposing perspectives on how we steer humanity towards a more optimal path.
Watch behind the scenes, get early access and join the private Discord by supporting us on Patreon. We have some amazing content going up there with Max Bennett and Kenneth Stanley this week! https://patreon.com/mlst (public discord) https://discord.gg/aNPkGUQtc5 https://twitter.com/MLStreetTalk
Post-interview with Beff and Connor: https://www.patreon.com/posts/97905213
Pre-interview with Connor and his colleague Dan Clothiaux: https://www.patreon.com/posts/connor-leahy-and-97631416
Leahy, known for his critical perspectives on AI and technology, challenges Jezos on a variety of assertions related to the accelerationist movement, market dynamics, and the need for regulation in the face of rapid technological advancements. Jezos, on the other hand, provides insights into the e/acc movement's core philosophies, emphasizing growth, adaptability, and the dangers of over-legislation and centralized control in current institutions.
Throughout the discussion, both speakers explore the concept of entropy, the role of competition in fostering innovation, and the balance needed to mediate order and chaos to ensure the prosperity and survival of civilization. They weigh up the risks and rewards of AI, the importance of maintaining a power equilibrium in society, and the significance of cultural and institutional dynamism.
Beff Jezos (Guillaume Verdon): https://twitter.com/BasedBeffJezos https://twitter.com/GillVerd Connor Leahy: https://twitter.com/npcollapse
YT: https://www.youtube.com/watch?v=0zxi0xSBOaQ
TOC:
00:00:00 - Intro
00:03:05 - Society library reference
00:03:35 - Debate starts
00:05:08 - Should any tech be banned?
00:20:39 - Leaded Gasoline
00:28:57 - False vacuum collapse method?
00:34:56 - What if there are dangerous aliens?
00:36:56 - Risk tolerances
00:39:26 - Optimizing for growth vs value
00:52:38 - Is vs ought
01:02:29 - AI discussion
01:07:38 - War / global competition
01:11:02 - Open source F16 designs
01:20:37 - Offense vs defense
01:28:49 - Morality / value
01:43:34 - What would Conor do
01:50:36 - Institutions/regulation
02:26:41 - Competition vs. Regulation Dilemma
02:32:50 - Existential Risks and Future Planning
02:41:46 - Conclusion and Reflection
Note from Tim: I baked the chapter metadata into the mp3 file this time, does that help the chapters show up in your app? Let me know. Also I accidentally exported a few minutes of dead audio at the end of the file - sorry about that just skip on when the episode finishes.
Watch behind the scenes, get early access and join the private Discord by supporting us on Patreon:
https://patreon.com/mlst (public discord)
https://discord.gg/aNPkGUQtc5
https://twitter.com/MLStreetTalk
YT version: https://youtu.be/n8G50ynU0Vg
In this interview on MLST, Dr. Tim Scarfe interviews Mahault Albarracin, who is the director of product for R&D at VERSES and also a PhD student in cognitive computing at the University of Quebec in Montreal. They discuss a range of topics related to consciousness, cognition, and machine learning.
Throughout the conversation, they touch upon various philosophical and computational concepts such as panpsychism, computationalism, and materiality. They consider the "hard problem" of consciousness, which is the question of how and why we have subjective experiences.
Albarracin shares her views on the controversial Integrated Information Theory and the open letter of opposition it received from the scientific community. She reflects on the nature of scientific critique and rivalry, advising caution in declaring entire fields of study as pseudoscientific.
A substantial part of the discussion is dedicated to the topic of science itself, where Albarracin talks about thresholds between legitimate science and pseudoscience, the role of evidence, and the importance of validating scientific methods and claims.
They touch upon language models, discussing whether they can be considered as having a "theory of mind" and the implications of assigning such properties to AI systems. Albarracin challenges the idea that there is a pure form of intelligence independent of material constraints and emphasizes the role of sociality in the development of our cognitive abilities.
Albarracin offers her thoughts on scientific endeavors, the predictability of systems, the nature of intelligence, and the processes of learning and adaptation. She gives insights into the concept of using degeneracy as a way to increase resilience within systems and the role of maintaining a degree of redundancy or extra capacity as a buffer against unforeseen events.
The conversation concludes with her discussing the potential benefits of collective intelligence, likening the adaptability and resilience of interconnected agent systems to those found in natural ecosystems.
https://www.linkedin.com/in/mahault-albarracin-1742bb153/
00:00:00 - Intro / IIT scandal
00:05:54 - Gaydar paper / What makes good science
00:10:51 - Language
00:18:16 - Intelligence
00:29:06 - X-risk
00:40:49 - Self modelling
00:43:56 - Anthropomorphisation
00:46:41 - Mediation and subjectivity
00:51:03 - Understanding
00:56:33 - Resiliency
Technical topics:
1. Integrated Information Theory (IIT) - Giulio Tononi
2. The "hard problem" of consciousness - David Chalmers
3. Panpsychism and Computationalism in philosophy of mind
4. Active Inference Framework - Karl Friston
5. Theory of Mind and its computation in AI systems
6. Noam Chomsky's views on language models and linguistics
7. Daniel Dennett's Intentional Stance theory
8. Collective intelligence and system resilience
9. Redundancy and degeneracy in complex systems
10. Michael Levin's research on bioelectricity and pattern formation
11. The role of phenomenology in cognitive science
Chai AI is the leading platform for conversational chat artificial intelligence.
Note: this is a sponsored episode of MLST.
William Beauchamp is the founder of two $100M+ companies - Chai Research, an AI startup, and Seamless Capital, a hedge fund based in Cambridge, UK. Chaiverse is the Chai AI developer platform, where developers can train, submit and evaluate on millions of real users to win their share of $1,000,000. https://www.chai-research.com https://www.chaiverse.com https://twitter.com/chai_research https://facebook.com/chairesearch/ https://www.instagram.com/chairesearch/ Download the app on iOS and Android (https://onelink.to/kqzhy9 ) #chai #chai_ai #chai_research #chaiverse #generative_ai #LLMs
Watch behind the scenes, get early access and join the private Discord by supporting us on Patreon:
https://patreon.com/mlst (public discord)
https://discord.gg/aNPkGUQtc5
https://twitter.com/MLStreetTalk
DOES AI HAVE AGENCY? With Professor. Karl Friston and Riddhi J. Pitliya
Agency in the context of cognitive science, particularly when considering the free energy principle, extends beyond just human decision-making and autonomy. It encompasses a broader understanding of how all living systems, including non-human entities, interact with their environment to maintain their existence by minimising sensory surprise.
According to the free energy principle, living organisms strive to minimize the difference between their predicted states and the actual sensory inputs they receive. This principle suggests that agency arises as a natural consequence of this process, particularly when organisms appear to plan ahead many steps in the future.
Riddhi J. Pitliya is based in the computational psychopathology lab doing her Ph.D at the University of Oxford and works with Professor Karl Friston at VERSES.
https://twitter.com/RiddhiJP
References:
THE FREE ENERGY PRINCIPLE—A PRECIS [Ramstead]
https://www.dialecticalsystems.eu/contributions/the-free-energy-principle-a-precis/
Active Inference: The Free Energy Principle in Mind, Brain, and Behavior [Thomas Parr, Giovanni Pezzulo, Karl J. Friston]
https://direct.mit.edu/books/oa-monograph/5299/Active-InferenceThe-Free-Energy-Principle-in-Mind
The beauty of collective intelligence, explained by a developmental biologist | Michael Levin
https://www.youtube.com/watch?v=U93x9AWeuOA
Growing Neural Cellular Automata
https://distill.pub/2020/growing-ca
Carcinisation
https://en.wikipedia.org/wiki/Carcinisation
Prof. KENNETH STANLEY - Why Greatness Cannot Be Planned
https://www.youtube.com/watch?v=lhYGXYeMq_E
On Defining Artificial Intelligence [Pei Wang]
https://sciendo.com/article/10.2478/jagi-2019-0002
Why? The Purpose of the Universe [Goff]
https://amzn.to/4aEqpfm
Umwelt
https://en.wikipedia.org/wiki/Umwelt
An Immense World: How Animal Senses Reveal the Hidden Realms [Yong]
https://amzn.to/3tzzTb7
What's it like to be a bat [Nagal]
https://www.sas.upenn.edu/~cavitch/pdf-library/Nagel_Bat.pdf
COUNTERFEIT PEOPLE. DANIEL DENNETT. (SPECIAL EDITION)
https://www.youtube.com/watch?v=axJtywd9Tbo
We live in the infosphere [FLORIDI]
https://www.youtube.com/watch?v=YLNGvvgq3eg
Mark Zuckerberg: First Interview in the Metaverse | Lex Fridman Podcast #398
https://www.youtube.com/watch?v=MVYrJJNdrEg
Black Mirror: Rachel, Jack and Ashley Too | Official Trailer | Netflix
https://www.youtube.com/watch?v=-qIlCo9yqpY
Watch behind the scenes, get early access and join private Discord by supporting us on Patreon: https://patreon.com/mlst
https://discord.gg/aNPkGUQtc5
https://twitter.com/MLStreetTalk
In this comprehensive exploration of the field of deep learning with Professor Simon Prince who has just authored an entire text book on Deep Learning, we investigate the technical underpinnings that contribute to the field's unexpected success and confront the enduring conundrums that still perplex AI researchers.
Key points discussed include the surprising efficiency of deep learning models, where high-dimensional loss functions are optimized in ways which defy traditional statistical expectations. Professor Prince provides an exposition on the choice of activation functions, architecture design considerations, and overparameterization. We scrutinize the generalization capabilities of neural networks, addressing the seeming paradox of well-performing overparameterized models. Professor Prince challenges popular misconceptions, shedding light on the manifold hypothesis and the role of data geometry in informing the training process. Professor Prince speaks about how layers within neural networks collaborate, recursively reconfiguring instance representations that contribute to both the stability of learning and the emergence of hierarchical feature representations. In addition to the primary discussion on technical elements and learning dynamics, the conversation briefly diverts to audit the implications of AI advancements with ethical concerns.
Follow Prof. Prince:
https://twitter.com/SimonPrinceAI
https://www.linkedin.com/in/simon-prince-615bb9165/
Get the book now!
https://mitpress.mit.edu/9780262048644/understanding-deep-learning/
https://udlbook.github.io/udlbook/
Panel: Dr. Tim Scarfe -
https://www.linkedin.com/in/ecsquizor/
https://twitter.com/ecsquendor
TOC:
[00:00:00] Introduction
[00:11:03] General Book Discussion
[00:15:30] The Neural Metaphor
[00:17:56] Back to Book Discussion
[00:18:33] Emergence and the Mind
[00:29:10] Computation in Transformers
[00:31:12] Studio Interview with Prof. Simon Prince
[00:31:46] Why Deep Neural Networks Work: Spline Theory
[00:40:29] Overparameterization in Deep Learning
[00:43:42] Inductive Priors and the Manifold Hypothesis
[00:49:31] Universal Function Approximation and Deep Networks
[00:59:25] Training vs Inference: Model Bias
[01:03:43] Model Generalization Challenges
[01:11:47] Purple Segment: Unknown Topic
[01:12:45] Visualizations in Deep Learning
[01:18:03] Deep Learning Theories Overview
[01:24:29] Tricks in Neural Networks
[01:30:37] Critiques of ChatGPT
[01:42:45] Ethical Considerations in AI
References on YT version VD: https://youtu.be/sJXn4Cl4oww
Watch behind the scenes with Bert on Patreon: https://www.patreon.com/posts/bert-de-vries-93230722 https://discord.gg/aNPkGUQtc5 https://twitter.com/MLStreetTalk
Note, there is some mild background music on chapter 1 (Least Action), 3 (Friston) and 5 (Variational Methods) - please skip ahead if annoying. It's a tiny fraction of the overall podcast.
YT version: https://youtu.be/2wnJ6E6rQsU
Bert de Vries is Professor in the Signal Processing Systems group at Eindhoven University. His research focuses on the development of intelligent autonomous agents that learn from in-situ interactions with their environment. His research draws inspiration from diverse fields including computational neuroscience, Bayesian machine learning, Active Inference and signal processing. Bert believes that development of signal processing systems will in the future be largely automated by autonomously operating agents that learn purposeful from situated environmental interactions. Bert received nis M.Sc. (1986) and Ph.D. (1991) degrees in Electrical Engineering from Eindhoven University of Technology (TU/e) and the University of Florida, respectively. From 1992 to 1999, he worked as a research scientist at Sarnoff Research Center in Princeton (NJ, USA). Since 1999, he has been employed in the hearing aids industry, both in engineering and managerial positions. De Vries was appointed part-time professor in the Signal Processing Systems Group at TU/e in 2012. Contact: https://twitter.com/bertdv0 https://www.tue.nl/en/research/researchers/bert-de-vries https://www.verses.ai/about-us Panel: Dr. Tim Scarfe / Dr. Keith Duggar TOC: [00:00:00] Principle of Least Action [00:05:10] Patreon Teaser [00:05:46] On Friston [00:07:34] Capm Peterson (VERSES) [00:08:20] Variational Methods [00:16:13] Dan Mapes (VERSES) [00:17:12] Engineering with Active Inference [00:20:23] Jason Fox (VERSES) [00:20:51] Riddhi Jain Pitliya [00:21:49] Hearing Aids as Adaptive Agents [00:33:38] Steven Swanson (VERSES) [00:35:46] Main Interview Kick Off, Engineering and Active Inference [00:43:35] Actor / Streaming / Message Passing [00:56:21] Do Agents Lose Flexibility with Maturity? [01:00:50] Language Compression [01:04:37] Marginalisation to Abstraction [01:12:45] Online Structural Learning [01:18:40] Efficiency in Active Inference [01:26:25] SEs become Neuroscientists [01:35:11] Building an Automated Engineer [01:38:58] Robustness and Design vs Grow [01:42:38] RXInfer [01:51:12] Resistance to Active Inference? [01:57:39] Diffusion of Responsibility in a System [02:10:33] Chauvinism in "Understanding" [02:20:08] On Becoming a Bayesian Refs: RXInfer https://biaslab.github.io/rxinfer-website/ Prof. Ariel Caticha https://www.albany.edu/physics/faculty/ariel-caticha Pattern recognition and machine learning (Bishop) https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf Data Analysis: A Bayesian Tutorial (Sivia) https://www.amazon.co.uk/Data-Analysis-Bayesian-Devinderjit-Sivia/dp/0198568320 Probability Theory: The Logic of Science (E. T. Jaynes) https://www.amazon.co.uk/Probability-Theory-Principles-Elementary-Applications/dp/0521592712/ #activeinference #artificialintelligence
Please support us https://www.patreon.com/mlst
https://discord.gg/aNPkGUQtc5
https://twitter.com/MLStreetTalk
Lance Da Costa aims to advance our understanding of intelligent systems by modelling cognitive systems and improving artificial systems.
He's a PhD candidate with Greg Pavliotis and Karl Friston jointly at Imperial College London and UCL, and a student in the Mathematics of Random Systems CDT run by Imperial College London and the University of Oxford. He completed an MRes in Brain Sciences at UCL with Karl Friston and Biswa Sengupta, an MASt in Pure Mathematics at the University of Cambridge with Oscar Randal-Williams, and a BSc in Mathematics at EPFL and the University of Toronto.
Summary:
Lance did pure math originally but became interested in the brain and AI. He started working with Karl Friston on the free energy principle, which claims all intelligent agents minimize free energy for perception, action, and decision-making. Lance has worked to provide mathematical foundations and proofs for why the free energy principle is true, starting from basic assumptions about agents interacting with their environment. This aims to justify the principle from first physics principles. Dr. Scarfe and Da Costa discuss different approaches to AI - the free energy/active inference approach focused on mimicking human intelligence vs approaches focused on maximizing capability like deep reinforcement learning. Lance argues active inference provides advantages for explainability and safety compared to black box AI systems. It provides a simple, sparse description of intelligence based on a generative model and free energy minimization. They discuss the need for structured learning and acquiring core knowledge to achieve more human-like intelligence. Lance highlights work from Josh Tenenbaum's lab that shows similar learning trajectories to humans in a simple Atari-like environment.
Incorporating core knowledge constraints the space of possible generative models the agent can use to represent the world, making learning more sample efficient. Lance argues active inference agents with core knowledge can match human learning capabilities.
They discuss how to make generative models interpretable, such as through factor graphs. The goal is to be able to understand the representations and message passing in the model that leads to decisions.
In summary, Lance argues active inference provides a principled approach to AI with advantages for explainability, safety, and human-like learning. Combining it with core knowledge and structural learning aims to achieve more human-like artificial intelligence.
https://www.lancelotdacosta.com/
https://twitter.com/lancelotdacosta
Interviewer: Dr. Tim Scarfe
TOC
00:00:00 - Start
00:09:27 - Intelligence
00:12:37 - Priors / structure learning
00:17:21 - Core knowledge
00:29:05 - Intelligence is specialised
00:33:21 - The magic of agents
00:39:30 - Intelligibility of structure learning
#artificialintelligence #activeinference
Please support us! https://www.patreon.com/mlst https://discord.gg/aNPkGUQtc5 https://twitter.com/MLStreetTalk
YT version (with intro not found here) https://youtu.be/6iaT-0Dvhnc This is the epic special edition show you have been waiting for! With two of the most brilliant scientists alive today. Atoms, things, agents, ... observers. What even defines an "observer" and what properties must all observers share? How do objects persist in our universe given that their material composition changes over time? What does it mean for a thing to be a thing? And do things supervene on our lower-level physical reality? What does it mean for a thing to have agency? What's the difference between a complex dynamical system with and without agency? Could a rock or an AI catflap have agency? Can the universe be factorised into distinct agents, or is agency diffused? Have you ever pondered about these deep questions about reality? Prof. Friston and Dr. Wolfram have spent their entire careers, some 40+ years each thinking long and hard about these very questions and have developed significant frameworks of reference on their respective journeys (the Wolfram Physics project and the Free Energy principle).
Panel: MIT Ph.D Keith Duggar Production: Dr. Tim Scarfe Refs: TED Talk with Stephen: https://www.ted.com/talks/stephen_wolfram_how_to_think_computationally_about_ai_the_universe_and_everything https://writings.stephenwolfram.com/2023/10/how-to-think-computationally-about-ai-the-universe-and-everything/ TOC 00:00:00 - Show kickoff
00:02:38 - Wolfram gets to grips with FEP
00:27:08 - How much control does an agent/observer have
00:34:52 - Observer persistence, what universe seems like to us
00:40:31 - Black holes
00:45:07 - Inside vs outside
00:52:20 - Moving away from the predictable path
00:55:26 - What can observers do
01:06:50 - Self modelling gives agency
01:11:26 - How do you know a thing has agency?
01:22:48 - Deep link between dynamics, ruliad and AI
01:25:52 - Does agency entail free will? Defining Agency
01:32:57 - Where do I probe for agency?
01:39:13 - Why is the universe the way we see it?
01:42:50 - Alien intelligence
01:43:40 - The hard problem of Observers
01:46:20 - Summary thoughts from Wolfram
01:49:35 - Factorisability of FEP
01:57:05 - Patreon interview teaser
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
YT version: https://www.youtube.com/watch?v=c4praCiy9qU
Dr. Jeff Beck is a computational neuroscientist studying probabilistic reasoning (decision making under uncertainty) in humans and animals with emphasis on neural representations of uncertainty and cortical implementations of probabilistic inference and learning. His line of research incorporates information theoretic and hierarchical statistical analysis of neural and behavioural data as well as reinforcement learning and active inference.
https://www.linkedin.com/in/jeff-beck...
https://scholar.google.com/citations?...
Interviewer: Dr. Tim Scarfe
TOC
00:00:00 Intro
00:00:51 Bayesian / Knowledge
00:14:57 Active inference
00:18:58 Mediation
00:23:44 Philosophy of mind / science
00:29:25 Optimisation
00:42:54 Emergence
00:56:38 Steering emergent systems
01:04:31 Work plan
01:06:06 Representations/Core knowledge
#activeinference
Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB Prof. Melanie Mitchell argues that the concept of "understanding" in AI is ill-defined and multidimensional - we can't simply say an AI system does or doesn't understand. She advocates for rigorously testing AI systems' capabilities using proper experimental methods from cognitive science. Popular benchmarks for intelligence often rely on the assumption that if a human can perform a task, an AI that performs the task must have human-like general intelligence. But benchmarks should evolve as capabilities improve. Large language models show surprising skill on many human tasks but lack common sense and fail at simple things young children can do. Their knowledge comes from statistical relationships in text, not grounded concepts about the world. We don't know if their internal representations actually align with human-like concepts. More granular testing focused on generalization is needed. There are open questions around whether large models' abilities constitute a fundamentally different non-human form of intelligence based on vast statistical correlations across text. Mitchell argues intelligence is situated, domain-specific and grounded in physical experience and evolution. The brain computes but in a specialized way honed by evolution for controlling the body. Extracting "pure" intelligence may not work. Other key points: - Need more focus on proper experimental method in AI research. Developmental psychology offers examples for rigorous testing of cognition. - Reporting instance-level failures rather than just aggregate accuracy can provide insights. - Scaling laws and complex systems science are an interesting area of complexity theory, with applications to understanding cities. - Concepts like "understanding" and "intelligence" in AI force refinement of fuzzy definitions. - Human intelligence may be more collective and social than we realize. AI forces us to rethink concepts we apply anthropomorphically. The overall emphasis is on rigorously building the science of machine cognition through proper experimentation and benchmarking as we assess emerging capabilities. TOC: [00:00:00] Introduction and Munk AI Risk Debate Highlights [05:00:00] Douglas Hofstadter on AI Risk [00:06:56] The Complexity of Defining Intelligence [00:11:20] Examining Understanding in AI Models [00:16:48] Melanie's Insights on AI Understanding Debate [00:22:23] Unveiling the Concept Arc [00:27:57] AI Goals: A Human vs Machine Perspective [00:31:10] Addressing the Extrapolation Challenge in AI [00:36:05] Brain Computation: The Human-AI Parallel [00:38:20] The Arc Challenge: Implications and Insights [00:43:20] The Need for Detailed AI Performance Reporting [00:44:31] Exploring Scaling in Complexity Theory Eratta: Note Tim said around 39 mins that a recent Stanford/DM paper modelling ARC “on GPT-4 got around 60%”. This is not correct and he misremembered. It was actually davinci3, and around 10%, which is still extremely good for a blank slate approach with an LLM and no ARC specific knowledge. Folks on our forum couldn’t reproduce the result. See paper linked below. Books (MUST READ): Artificial Intelligence: A Guide for Thinking Humans (Melanie Mitchell) https://www.amazon.co.uk/Artificial-Intelligence-Guide-Thinking-Humans/dp/B07YBHNM1C/?&_encoding=UTF8&tag=mlst00-21&linkCode=ur2&linkId=44ccac78973f47e59d745e94967c0f30&camp=1634&creative=6738 Complexity: A Guided Tour (Melanie Mitchell) https://www.amazon.co.uk/Audible-Complexity-A-Guided-Tour?&_encoding=UTF8&tag=mlst00-21&linkCode=ur2&linkId=3f8bd505d86865c50c02dd7f10b27c05&camp=1634&creative=6738
Show notes (transcript, full references etc)
https://atlantic-papyrus-d68.notion.site/Melanie-Mitchell-2-0-15e212560e8e445d8b0131712bad3000?pvs=25
YT version: https://youtu.be/29gkDpR2orc
We explore connections between FEP and enactivism, including tensions raised in a paper critiquing FEP from an enactivist perspective.
Dr. Maxwell Ramstead provides background on enactivism emerging from autopoiesis, with a focus on embodied cognition and rejecting information processing/computational views of mind.
Chris shares his journey from robotics into FEP, starting as a skeptic but becoming convinced it's the right framework. He notes there are both "high road" and "low road" versions, ranging from embodied to more radically anti-representational stances. He doesn't see a definitive fork between dynamical systems and information theory as the source of conflict. Rather, the notion of operational closure in enactivism seems to be the main sticking point.
The group explores definitional issues around structure/organization, boundaries, and operational closure. Maxwell argues the generative model in FEP captures organizational dependencies akin to operational closure. The Markov blanket formalism models structural interfaces.
We discuss the concept of goals in cognitive systems - Chris advocates an intentional stance perspective - using notions of goals/intentions if they help explain system dynamics. Goals emerge from beliefs about dynamical trajectories. Prof Friston provides an elegant explanation of how goal-directed behavior naturally falls out of the FEP mathematics in a particular "goldilocks" regime of system scale/dynamics. The conversation explores the idea that many systems simply act "as if" they have goals or models, without necessarily possessing explicit representations. This helps resolve tensions between enactivist and computational perspectives.
Throughout the dialogue, Maxwell presses philosophical points about the FEP abolishing what he perceives as false dichotomies in cognitive science such as internalism/externalism. He is critical of enactivists' commitment to bright line divides between subject areas.
Prof. Karl Friston - Inventor of the free energy principle https://scholar.google.com/citations?user=q_4u0aoAAAAJ
Prof. Chris Buckley - Professor of Neural Computation at Sussex University https://scholar.google.co.uk/citations?user=nWuZ0XcAAAAJ&hl=en
Dr. Maxwell Ramstead - Director of Research at VERSES https://scholar.google.ca/citations?user=ILpGOMkAAAAJ&hl=fr
We address critique in this paper:
Laying down a forking path: Tensions between enaction and the free energy principle (Ezequiel A. Di Paolo, Evan Thompson, Randall D. Beere)
https://philosophymindscience.org/index.php/phimisci/article/download/9187/8975
Other refs:
Multiscale integration: beyond internalism and externalism (Maxwell J D Ramstead)
https://pubmed.ncbi.nlm.nih.gov/33627890/
MLST panel: Dr. Tim Scarfe and Dr. Keith Duggar
TOC (auto generated): 0:00 - Introduction 0:41 - Defining enactivism and its variants 6:58 - The source of the conflict between dynamical systems and information theory 8:56 - Operational closure in enactivism 10:03 - Goals and intentions 12:35 - The link between dynamical systems and information theory 15:02 - Path integrals and non-equilibrium dynamics 18:38 - Operational closure defined 21:52 - Structure vs. organization in enactivism 24:24 - Markov blankets as interfaces 28:48 - Operational closure in FEP 30:28 - Structure and organization again 31:08 - Dynamics vs. information theory 33:55 - Goals and intentions emerge in the FEP mathematics 36:58 - The Good Regulator Theorem 49:30 - enactivism and its relation to ecological psychology 52:00 - Goals, intentions and beliefs 55:21 - Boundaries and meaning 58:55 - Enactivism's rejection of information theory 1:02:08 - Beliefs vs goals 1:05:06 - Ecological psychology and FEP 1:08:41 - The Good Regulator Theorem 1:18:38 - How goal-directed behavior emerges 1:23:13 - Ontological vs metaphysical boundaries 1:25:20 - Boundaries as maps 1:31:08 - Connections to the maximum entropy principle 1:33:45 - Relations to quantum and relational physics
Please check out Numerai - our sponsor @ http://numer.ai/mlst Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB The Second Law: Resolving the Mystery of the Second Law of Thermodynamics Buy Stephen's book here - https://tinyurl.com/2jj2t9wa The Language Game: How Improvisation Created Language and Changed the World by Morten H. Christiansen and Nick Chater Buy here: https://tinyurl.com/35bvs8be Stephen Wolfram starts by discussing the second law of thermodynamics - the idea that entropy, or disorder, tends to increase over time. He talks about how this law seems intuitively true, but has been difficult to prove. Wolfram outlines his decades-long quest to fully understand the second law, including failed early attempts to simulate particles mixing as a 12-year-old. He explains how irreversibility arises from the computational irreducibility of underlying physical processes coupled with our limited ability as observers to do the computations needed to "decrypt" the microscopic details. The conversation then shifts to discussing language and how concepts allow us to communicate shared ideas between minds positioned in different parts of "rule space." Wolfram talks about the successes and limitations of using large language models to generate Wolfram Language code from natural language prompts. He sees it as a useful tool for getting started programming, but one still needs human refinement. The final part of the conversation focuses on AI safety and governance. Wolfram notes uncontrolled actuation is where things can go wrong with AI systems. He discusses whether AI agents could have intrinsic experiences and goals, how we might build trust networks between AIs, and that managing a system of many AIs may be easier than a single AI. Wolfram emphasizes the need for more philosophical depth in thinking about AI aims, and draws connections between potential solutions and his work on computational irreducibility and physics. Show notes: https://docs.google.com/document/d/1hXNHtvv8KDR7PxCfMh9xOiDFhU3SVDW8ijyxeTq9LHo/edit?usp=sharing Pod version: TBA https://twitter.com/stephen_wolfram TOC: 00:00:00 - Introduction 00:02:34 - Second law book 00:14:01 - Reversibility / entropy / observers / equivalence 00:34:22 - Concepts/language in the ruliad 00:49:04 - Comparison to free energy principle 00:53:58 - ChatGPT / Wolfram / Language 01:00:17 - AI risk Panel: Dr. Tim Scarfe @ecsquendor / Dr. Keith Duggar @DoctorDuggar
Please check out Numerai - our sponsor @ http://numer.ai/mlst Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB Professor Jürgen Schmidhuber, the father of artificial intelligence, joins us today. Schmidhuber discussed the history of machine learning, the current state of AI, and his career researching recursive self-improvement, artificial general intelligence and its risks. Schmidhuber pointed out the importance of studying the history of machine learning to properly assign credit for key breakthroughs. He discussed some of the earliest machine learning algorithms. He also highlighted the foundational work of Leibniz, who discovered the chain rule that enables training of deep neural networks, and the ancient Antikythera mechanism, the first known gear-based computer. Schmidhuber discussed limits to recursive self-improvement and artificial general intelligence, including physical constraints like the speed of light and what can be computed. He noted we have no evidence the human brain can do more than traditional computing. Schmidhuber sees humankind as a potential stepping stone to more advanced, spacefaring machine life which may have little interest in humanity. However, he believes commercial incentives point AGI development towards being beneficial and that open-source innovation can help to achieve "AI for all" symbolised by his company's motto "AI∀". Schmidhuber discussed approaches he believes will lead to more general AI, including meta-learning, reinforcement learning, building predictive world models, and curiosity-driven learning. His "fast weight programming" approach from the 1990s involved one network altering another network's connections. This was actually the first Transformer variant, now called an unnormalised linear Transformer. He also described the first GANs in 1990, to implement artificial curiosity. Schmidhuber reflected on his career researching AI. He said his fondest memories were gaining insights that seemed to solve longstanding problems, though new challenges always arose: "then for a brief moment it looks like the greatest thing since sliced bread and and then you get excited ... but then suddenly you realize, oh, it's still not finished. Something important is missing.” Since 1985 he has worked on systems that can recursively improve themselves, constrained only by the limits of physics and computability. He believes continual progress, shaped by both competition and collaboration, will lead to increasingly advanced AI. On AI Risk: Schmidhuber: "To me it's indeed weird. Now there are all these letters coming out warning of the dangers of AI. And I think some of the guys who are writing these letters, they are just seeking attention because they know that AI dystopia are attracting more attention than documentaries about the benefits of AI in healthcare." Schmidhuber believes we should be more concerned with existing threats like nuclear weapons than speculative risks from advanced AI. He said: "As far as I can judge, all of this cannot be stopped but it can be channeled in a very natural way that is good for humankind...there is a tremendous bias towards good AI, meaning AI that is good for humans...I am much more worried about 60 year old technology that can wipe out civilization within two hours, without any AI.”
[this is truncated, read show notes]
YT: https://youtu.be/q27XMPm5wg8
Show notes: https://docs.google.com/document/d/13-vIetOvhceZq5XZnELRbaazpQbxLbf5Yi7M25CixEE/edit?usp=sharing Note: Interview was recorded 15th June 2023. https://twitter.com/SchmidhuberAI Panel: Dr. Tim Scarfe @ecsquendor / Dr. Keith Duggar @DoctorDuggar Pod version: TBA TOC: [00:00:00] Intro / Numerai [00:00:51] Show Kick Off [00:02:24] Credit Assignment in ML [00:12:51] XRisk [00:20:45] First Transformer variant of 1991 [00:47:20] Which Current Approaches are Good [00:52:42] Autonomy / Curiosity [00:58:42] GANs of 1990 [01:11:29] OpenAI, Moats, Legislation
Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB
George Hotz and Connor Leahy discuss the crucial challenge of developing beneficial AI that is aligned with human values. Hotz believes truly aligned AI is impossible, while Leahy argues it's a solvable technical challenge.
Hotz contends that AI will inevitably pursue power, but distributing AI widely would prevent any single AI from dominating. He advocates open-sourcing AI developments to democratize access. Leahy counters that alignment is necessary to ensure AIs respect human values. Without solving alignment, general AI could ignore or harm humans.
They discuss whether AI's tendency to seek power stems from optimization pressure or human-instilled goals. Leahy argues goal-seeking behavior naturally emerges while Hotz believes it reflects human values. Though agreeing on AI's potential dangers, they differ on solutions. Hotz favors accelerating AI progress and distributing capabilities while Leahy wants safeguards put in place.
While acknowledging risks like AI-enabled weapons, they debate whether broad access or restrictions better manage threats. Leahy suggests limiting dangerous knowledge, but Hotz insists openness checks government overreach. They concur that coordination and balance of power are key to navigating the AI revolution. Both eagerly anticipate seeing whose ideas prevail as AI progresses.
Transcript and notes: https://docs.google.com/document/d/1smkmBY7YqcrhejdbqJOoZHq-59LZVwu-DNdM57IgFcU/edit?usp=sharing
Note: this is not a normal episode i.e. the hosts are not part of the debate (and for the record don't agree with Connor or George).
TOC: [00:00:00] Introduction to George Hotz and Connor Leahy [00:03:10] George Hotz's Opening Statement: Intelligence and Power [00:08:50] Connor Leahy's Opening Statement: Technical Problem of Alignment and Coordination [00:15:18] George Hotz's Response: Nature of Cooperation and Individual Sovereignty [00:17:32] Discussion on individual sovereignty and defense [00:18:45] Debate on living conditions in America versus Somalia [00:21:57] Talk on the nature of freedom and the aesthetics of life [00:24:02] Discussion on the implications of coordination and conflict in politics [00:33:41] Views on the speed of AI development / hard takeoff [00:35:17] Discussion on potential dangers of AI [00:36:44] Discussion on the effectiveness of current AI [00:40:59] Exploration of potential risks in technology [00:45:01] Discussion on memetic mutation risk [00:52:36] AI alignment and exploitability [00:53:13] Superintelligent AIs and the assumption of good intentions [00:54:52] Humanity’s inconsistency and AI alignment [00:57:57] Stability of the world and the impact of superintelligent AIs [01:02:30] Personal utopia and the limitations of AI alignment [01:05:10] Proposed regulation on limiting the total number of flops [01:06:20] Having access to a powerful AI system [01:18:00] Power dynamics and coordination issues with AI [01:25:44] Humans vs AI in Optimization [01:27:05] The Impact of AI's Power Seeking Behavior [01:29:32] A Debate on the Future of AI
Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB Join us for a fascinating discussion of the free energy principle with Dr. Maxwell Ramsted, a leading thinker exploring the intersection of math, physics, and philosophy and Director of Research at VERSES. The FEP was proposed by renowned neuroscientist Karl Friston, this principle offers a unifying theory explaining how systems maintain order and their identity. The free energy principle inverts traditional survival logic. Rather than asking what behaviors promote survival, it queries - given things exist, what must they do? The answer: minimizing free energy, or "surprise." Systems persist by constantly ensuring their internal states match anticipated states based on a model of the world. Failure to minimize surprise leads to chaos as systems dissolve into disorder. Thus, the free energy principle elucidates why lifeforms relentlessly model and predict their surroundings. It is an existential imperative counterbalancing entropy. Essentially, this principle describes the mind's pursuit of harmony between expectations and reality. Its relevance spans from cells to societies, underlying order wherever longevity is found. Our discussion explores the technical details and philosophical implications of this paradigm-shifting theory. How does it further our understanding of cognition and intelligence? What insights does it offer about the fundamental patterns and properties of existence? Can it precipitate breakthroughs in disciplines like neuroscience and artificial intelligence? Dr. Ramstead completed his Ph.D. at McGill University in Montreal, Canada in 2019, with frequent research visits to UCL in London, under the supervision of the world’s most cited neuroscientist, Professor Karl Friston (UCL).
YT version: https://youtu.be/8qb28P7ksyE https://scholar.google.ca/citations?user=ILpGOMkAAAAJ&hl=frhttps://spatialwebfoundation.org/team/maxwell-ramstead/https://www.linkedin.com/in/maxwell-ramstead-43a1991b7/https://twitter.com/mjdramstead VERSES AI: https://www.verses.ai/ Intro: Tim Scarfe (Ph.D) Interviewer: Keith Duggar (Ph.D MIT) TOC: 0:00:00 - Tim Intro 0:08:10 - Intro and philosophy 0:14:26 - Intro to Maxwell 0:18:00 - FEP 0:29:08 - Markov Blankets 0:51:15 - Verses AI / Applications of FEP 1:05:55 - Potential issues with deploying FEP 1:10:50 - Shared knowledge graphs 1:14:29 - XRisk / Ethics 1:24:57 - Strength of Verses 1:28:30 - Misconceptions about FEP, Physics vs philosophy/criticism 1:44:41 - Emergence / consciousness References: Principia Mathematica https://www.abebooks.co.uk/servlet/BookDetailsPL?bi=30567249049 Andy Clark's paper "Whatever Next? Predictive Brains, Situated Agents, and the Future of Cognitive Science" (Behavioral and Brain Sciences, 2013) https://pubmed.ncbi.nlm.nih.gov/23663408/ "Math Does Not Represent" by Erik Curiel https://www.youtube.com/watch?v=aA_T20HAzyY A free energy principle for generic quantum systems (Chris Fields et al) https://arxiv.org/pdf/2112.15242.pdf Designing explainable artificial intelligence with active inference https://arxiv.org/abs/2306.04025 Am I Self-Conscious? (Friston) https://www.frontiersin.org/articles/10.3389/fpsyg.2018.00579/full The Meta-Problem of Consciousness https://philarchive.org/archive/CHATMO-32v1 The Map-Territory Fallacy Fallacy https://arxiv.org/abs/2208.06924 A Technical Critique of Some Parts of the Free Energy Principle - Martin Biehl et al https://arxiv.org/abs/2001.06408 WEAK MARKOV BLANKETS IN HIGH-DIMENSIONAL, SPARSELY-COUPLED RANDOM DYNAMICAL SYSTEMS - DALTON A R SAKTHIVADIVEL https://arxiv.org/pdf/2207.07620.pdf
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/ESrGqhf5CB
The discussion between Tim Scarfe and David Foster provided an in-depth critique of the arguments made by panelists at the Munk AI Debate on whether artificial intelligence poses an existential threat to humanity. While the panelists made thought-provoking points, Scarfe and Foster found their arguments largely speculative, lacking crucial details and evidence to support claims of an impending existential threat.
Scarfe and Foster strongly disagreed with Max Tegmark’s position that AI has an unparalleled “blast radius” that could lead to human extinction. Tegmark failed to provide a credible mechanism for how this scenario would unfold in reality. His arguments relied more on speculation about advanced future technologies than on present capabilities and trends. As Foster argued, we cannot conclude AI poses a threat based on speculation alone. Evidence is needed to ground discussions of existential risks in science rather than science fiction fantasies or doomsday scenarios.
They found Yann LeCun’s statements too broad and high-level, critiquing him for not providing sufficiently strong arguments or specifics to back his position. While LeCun aptly noted AI remains narrow in scope and far from achieving human-level intelligence, his arguments lacked crucial details on current limitations and why we should not fear superintelligence emerging in the near future. As Scarfe argued, without these details the discussion descended into “philosophy” rather than focusing on evidence and data.
Scarfe and Foster also took issue with Yoshua Bengio’s unsubstantiated speculation that machines would necessarily develop a desire for self-preservation that threatens humanity. There is no evidence today’s AI systems are developing human-like general intelligence or desires, let alone that these attributes would manifest in ways dangerous to humans. The question is not whether machines will eventually surpass human intelligence, but how and when this might realistically unfold based on present technological capabilities. Bengio’s arguments relied more on speculation about advanced future technologies than on evidence from current systems and research.
In contrast, they strongly agreed with Melanie Mitchell’s view that scenarios of malevolent or misguided superintelligence are speculation, not backed by evidence from AI as it exists today. Claims of an impending “existential threat” from AI are overblown, harmful to progress, and inspire undue fear of technology rather than consideration of its benefits. Mitchell sensibly argued discussions of risks from emerging technologies must be grounded in science and data, not speculation, if we are to make balanced policy and development decisions.
Overall, while the debate raised thought-provoking questions about advanced technologies that could eventually transform our world, none of the speakers made a credible evidence-based case that today’s AI poses an existential threat. Scarfe and Foster argued the debate failed to discuss concrete details about current capabilities and limitations of technologies like language models, which remain narrow in scope. General human-level AI is still missing many components, including physical embodiment, emotions, and the "common sense" reasoning that underlies human thinking. Claims of existential threats require extraordinary evidence to justify policy or research restrictions, not speculation. By discussing possibilities rather than probabilities grounded in evidence, the debate failed to substantively advance our thinking on risks from AI and its plausible development in the coming decades.
David's new podcast: https://podcasts.apple.com/us/podcast/the-ai-canvas/id1692538973
Generative AI book: https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/
Sponsored Episode - YouAi What if an AI truly knew you—your thoughts, values, aptitudes, and dreams? An AI that could enhance your life in profound ways by amplifying your strengths, augmenting your weaknesses, and connecting you with like-minded souls. That is the vision of YouAi. YouAi founder Dmitri Shapiro believes digitizing our inner lives could unlock tremendous benefits. But mapping the human psyche also poses deep questions. As technology mediates our self-understanding, what risks rendering our minds in bits and algorithms? Could we gain a new means of flourishing or lose something intangible? There are no easy answers, but YouAi offers a vision balanced by hard thinking. Shapiro discussed YouAi's app, which builds personalized AI assistants by learning how individuals think through interactive questions. As people share, YouAi develops a multidimensional model of their mind. Users get a tailored feed of prompts to continue engaging and teaching their AI. YouAi's vision provides a glimpse into a future that could unsettle or fulfill our hopes. As technology mediates understanding ourselves and others, will we risk losing what makes us human or find new means of flourishing? YouAI believes that together, we can build a future where our minds contain infinite potential—and their technology helps unlock it. But we must proceed thoughtfully, upholding human dignity above all else. Our minds shape who we are. And who we can become.Digitise your mind today: YouAi - https://YouAi.aiMIndStudio – https://YouAi.ai/mindstudioYouAi Mind Indexer - https://YouAi.ai/trainJoin the MLST discord and register for the YouAi event on July 13th: https://discord.gg/ESrGqhf5CB TOC: 0:00:00 - Introduction to Mind Digitization 0:09:31 - The YouAi Platform and Personal Applications 0:27:54 - The Potential of Group Alignment 0:30:28 - Applications in Human-to-Human Communication 0:35:43 - Applications in Interfacing with Digital Technology 0:43:41 - Introduction to the Project 0:44:51 - Brain digitization and mind vs. brain 0:49:55 - The Extended Mind and Neurofeedback 0:54:16 - Personalized Learning and the Future of Education 1:02:19 - Privacy and Data Security 1:14:20 - Ethical Considerations of Digitizing the Mind 1:19:49 - The Metaverse and the Future of Digital Identity 1:25:17 - Digital Immortality and Legacy 1:29:09 - The Nature of Consciousness 1:34:11 - Digitization of the Mind 1:35:06 - Potential Inequality in a Digital World 1:38:00 - The Role of Technology in Equalizing or Democratizing Society 1:40:51 - The Future of the Startup and Community Involvement
Support us! https://www.patreon.com/mlst MLST Discord: https://discord.gg/aNPkGUQtc5 Twitter: https://twitter.com/MLStreetTalk The first 10 mins of audio from Joscha isn't great, it improves after.
Transcript and longer summary: https://docs.google.com/document/d/1TUJhlSVbrHf2vWoe6p7xL5tlTK_BGZ140QqqTudF8UI/edit?usp=sharing Dr. Joscha Bach argued that general intelligence emerges from civilization, not individuals. Given our biological constraints, humans cannot achieve a high level of general intelligence on our own. Bach believes AGI may become integrated into all parts of the world, including human minds and bodies. He thinks a future where humans and AGI harmoniously coexist is possible if we develop a shared purpose and incentive to align. However, Bach is uncertain about how AI progress will unfold or which scenarios are most likely. Bach argued that global control and regulation of AI is unrealistic. While regulation may address some concerns, it cannot stop continued progress in AI. He believes individuals determine their own values, so "human values" cannot be formally specified and aligned across humanity. For Bach, the possibility of building beneficial AGI is exciting but much work is still needed to ensure a positive outcome. Connor Leahy believes we have more control over the future than the default outcome might suggest. With sufficient time and effort, humanity could develop the technology and coordination to build a beneficial AGI. However, the default outcome likely leads to an undesirable scenario if we do not actively work to build a better future. Leahy thinks finding values and priorities most humans endorse could help align AI, even if individuals disagree on some values. Leahy argued a future where humans and AGI harmoniously coexist is ideal but will require substantial work to achieve. While regulation faces challenges, it remains worth exploring. Leahy believes limits to progress in AI exist but we are unlikely to reach them before humanity is at risk. He worries even modestly superhuman intelligence could disrupt the status quo if misaligned with human values and priorities. Overall, Bach and Leahy expressed optimism about the possibility of building beneficial AGI but believe we must address risks and challenges proactively. They agreed substantial uncertainty remains around how AI will progress and what scenarios are most plausible. But developing a shared purpose between humans and AI, improving coordination and control, and finding human values to help guide progress could all improve the odds of a beneficial outcome. With openness to new ideas and willingness to consider multiple perspectives, continued discussions like this one could help ensure the future of AI is one that benefits and inspires humanity. TOC: 00:00:00 - Introduction and Background 00:02:54 - Different Perspectives on AGI 00:13:59 - The Importance of AGI 00:23:24 - Existential Risks and the Future of Humanity 00:36:21 - Coherence and Coordination in Society 00:40:53 - Possibilities and Future of AGI 00:44:08 - Coherence and alignment 01:08:32 - The role of values in AI alignment 01:18:33 - The future of AGI and merging with AI 01:22:14 - The limits of AI alignment 01:23:06 - The scalability of intelligence 01:26:15 - Closing statements and future prospects
In this wide-ranging conversation, Tim Scarfe interviews Neel Nanda, a researcher at DeepMind working on mechanistic interpretability, which aims to understand the algorithms and representations learned by machine learning models. Neel discusses how models can represent their thoughts using motifs, circuits, and linear directional features which are often communicated via a "residual stream", an information highway models use to pass information between layers.
Neel argues that "superposition", the ability for models to represent more features than they have neurons, is one of the biggest open problems in interpretability. This is because superposition thwarts our ability to understand models by decomposing them into individual units of analysis. Despite this, Neel remains optimistic that ambitious interpretability is possible, citing examples like his work reverse engineering how models do modular addition. However, Neel notes we must start small, build rigorous foundations, and not assume our theoretical frameworks perfectly match reality.
The conversation turns to whether models can have goals or agency, with Neel arguing they likely can based on heuristics like models executing long term plans towards some objective. However, we currently lack techniques to build models with specific goals, meaning any goals would likely be learned or emergent. Neel highlights how induction heads, circuits models use to track long range dependencies, seem crucial for phenomena like in-context learning to emerge.
On the existential risks from AI, Neel believes we should avoid overly confident claims that models will or will not be dangerous, as we do not understand them enough to make confident theoretical assertions. However, models could pose risks through being misused, having undesirable emergent properties, or being imperfectly aligned. Neel argues we must pursue rigorous empirical work to better understand and ensure model safety, avoid "philosophizing" about definitions of intelligence, and focus on ensuring researchers have standards for what it means to decide a system is "safe" before deploying it. Overall, a thoughtful conversation on one of the most important issues of our time.
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
Twitter: https://twitter.com/MLStreetTalk
Neel Nanda: https://www.neelnanda.io/
TOC
[00:00:00] Introduction and Neel Nanda's Interests (walk and talk)
[00:03:15] Mechanistic Interpretability: Reverse Engineering Neural Networks
[00:13:23] Discord questions
[00:21:16] Main interview kick-off in studio
[00:49:26] Grokking and Sudden Generalization
[00:53:18] The Debate on Systematicity and Compositionality
[01:19:16] How do ML models represent their thoughts
[01:25:51] Do Large Language Models Learn World Models?
[01:53:36] Superposition and Interference in Language Models
[02:43:15] Transformers discussion
[02:49:49] Emergence and In-Context Learning
[03:20:02] Superintelligence/XRisk discussion
Transcript: https://docs.google.com/document/d/1FK1OepdJMrqpFK-_1Q3LQN6QLyLBvBwWW_5z8WrS1RI/edit?usp=sharing
Refs: https://docs.google.com/document/d/115dAroX0PzSduKr5F1V4CWggYcqIoSXYBhcxYktCnqY/edit?usp=sharing
Please check out Numerai - our sponsor using our link @
http://numer.ai/mlst
Numerai is a groundbreaking platform which is taking the data science world by storm. Tim has been using Numerai to build state-of-the-art models which predict the stock market, all while being a part of an inspiring community of data scientists from around the globe. They host the Numerai Data Science Tournament, where data scientists like us use their financial dataset to predict future stock market performance.
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
Twitter: https://twitter.com/MLStreetTalk
YT version: https://youtu.be/axJtywd9Tbo
In this fascinating interview, Dr. Tim Scarfe speaks with renowned philosopher Daniel Dennett about the potential dangers of AI and the concept of "Counterfeit People." Dennett raises concerns about AI being used to create artificial colleagues, and argues that preventing counterfeit AI individuals is crucial for societal trust and security.
They delve into Dennett's "Two Black Boxes" thought experiment, the Chinese Room Argument by John Searle, and discuss the implications of AI in terms of reversibility, reontologisation, and realism. Dr. Scarfe and Dennett also examine adversarial LLMs, mental trajectories, and the emergence of consciousness and semanticity in AI systems.
Throughout the conversation, they touch upon various philosophical perspectives, including Gilbert Ryle's Ghost in the Machine, Chomsky's work, and the importance of competition in academia. Dennett concludes by highlighting the need for legal and technological barriers to protect against the dangers of counterfeit AI creations.
Join Dr. Tim Scarfe and Daniel Dennett in this thought-provoking discussion about the future of AI and the potential challenges we face in preserving our civilization. Don't miss this insightful conversation!
TOC:
00:00:00 Intro
00:09:56 Main show kick off
00:12:04 Counterfeit People
00:16:03 Reversibility
00:20:55 Reontologisation
00:24:43 Realism
00:27:48 Adversarial LLMs are out to get us
00:32:34 Exploring mental trajectories and Chomsky
00:38:53 Gilbert Ryle and Ghost in machine and competition in academia
00:44:32 2 Black boxes thought experiment / intentional stance
01:00:11 Chinese room
01:04:49 Singularitarianism
01:07:22 Emergence of consciousness and semanticity
References:
Tree of Thoughts: Deliberate Problem Solving with Large Language Models
https://arxiv.org/abs/2305.10601
The Problem With Counterfeit People (Daniel Dennett)
https://www.theatlantic.com/technology/archive/2023/05/problem-counterfeit-people/674075/
The knowledge argument
https://en.wikipedia.org/wiki/Knowledge_argument
The Intentional Stance
https://www.researchgate.net/publication/271180035_The_Intentional_Stance
Two Black Boxes: a Fable (Daniel Dennett)
https://www.researchgate.net/publication/28762339_Two_Black_Boxes_a_Fable
The Chinese Room Argument (John Searle)
https://plato.stanford.edu/entries/chinese-room/
https://web-archive.southampton.ac.uk/cogprints.org/7150/1/10.1.1.83.5248.pdf
From Bacteria to Bach and Back: The Evolution of Minds (Daniel Dennett)
https://www.amazon.co.uk/Bacteria-Bach-Back-Evolution-Minds/dp/014197804X
Consciousness Explained (Daniel Dennett)
https://www.amazon.co.uk/Consciousness-Explained-Penguin-Science-Dennett/dp/0140128670/
The Mind's I: Fantasies and Reflections on Self and Soul (Hofstadter, Douglas R; Dennett, Daniel C.)
https://www.abebooks.co.uk/servlet/BookDetailsPL?bi=31494476184
#DanielDennett #ArtificialIntelligence #CounterfeitPeople
Support us! https://www.patreon.com/mlst MLST Discord: https://discord.gg/aNPkGUQtc5 Twitter: https://twitter.com/MLStreetTalk In this eye-opening discussion between Tim Scarfe and Prof. Jim Hughes, a professor of gene regulation at Oxford University, they explore the intersection of creativity, genomics, and artificial intelligence. Prof. Hughes brings his expertise in genomics and insights from his interdisciplinary research group, which includes machine learning experts, mathematicians, and molecular biologists. The conversation begins with an overview of Prof. Hughes' background and the importance of creativity in scientific research. They delve into the challenges of unlocking the secrets of the human genome and how machine learning, specifically convolutional neural networks, can assist in decoding genome function. As they discuss validation and interpretability concerns in machine learning, they acknowledge the need for experimental tests and ponder the complex nature of understanding the basic code of life. They touch upon the fascinating world of morphogenesis and emergence, considering the potential crossovers into AI and their implications for self-repairing systems in medicine. Examining the ethical and regulatory aspects of genomics and AI, the duo explores the implications of having access to someone's genome, the potential to predict traits or diseases, and the role of AI in understanding complex genetic signals. They also consider the challenges of keeping up with the rapidly expanding body of scientific research and the pressures faced by researchers in academia. To wrap up the discussion, Tim and Prof. Hughes shed light on the significance of creativity and diversity in scientific research, emphasizing the need for divergent processes and diverse perspectives to foster innovation and avoid consensus-driven convergence. Filmed at https://www.creativemachine.io/Prof. Jim Hughes: https://www.rdm.ox.ac.uk/people/jim-hughesDr. Tim Scarfe: https://xrai.glass/ Table of Contents: 1. [0:00:00] Introduction and Prof. Jim Hughes' background 2. [0:02:48] Creativity and its role in science 3. [0:07:13] Challenges in understanding the human genome 4. [0:13:20] Using convolutional neural networks to decode genome function 5. [0:15:32] Validation and interpretability concerns in machine learning 6. [0:17:56] Challenges in understanding the basic code of life 7. [0:19:36] Morphogenesis, emergence, and potential crossovers into AI 8. [0:21:38] Ethics and regulation in genomics and AI 9. [0:23:30] The role of AI in understanding and managing genetic risks 10. [0:32:37] Creativity and diversity in scientific research
Please check out Numerai - our sponsor @
https://numerai.com/mlst
Numerai is a groundbreaking platform which is taking the data science world by storm. Tim has been using Numerai to build state-of-the-art models which predict the stock market, all while being a part of an inspiring community of data scientists from around the globe. They host the Numerai Data Science Tournament, where data scientists like us use their financial dataset to predict future stock market performance.
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
Twitter: https://twitter.com/MLStreetTalk
Welcome to an exciting episode featuring an outstanding guest, Robert Miles! Renowned for his extraordinary contributions to understanding AI and its potential impacts on our lives, Robert is an artificial intelligence advocate, researcher, and YouTube sensation. He combines engaging discussions with entertaining content, captivating millions of viewers from around the world.
With a strong computer science background, Robert has been actively involved in AI safety projects, focusing on raising awareness about potential risks and benefits of advanced AI systems. His YouTube channel is celebrated for making AI safety discussions accessible to a diverse audience through breaking down complex topics into easy-to-understand nuggets of knowledge, and you might also recognise him from his appearances on Computerphile.
In this episode, join us as we dive deep into Robert's journey in the world of AI, exploring his insights on AI alignment, superintelligence, and the role of AI shaping our society and future. We'll discuss topics such as the limits of AI capabilities and physics, AI progress and timelines, human-machine hybrid intelligence, AI in conflict and cooperation with humans, and the convergence of AI communities.
Robert Miles:
@RobertMilesAI
https://twitter.com/robertskmiles
https://aisafety.info/
YT version: https://www.youtube.com/watch?v=kMLKbhY0ji0
Panel:
Dr. Tim Scarfe
Dr. Keith Duggar
Joint CTOs - https://xrai.glass/
Refs:
Are Emergent Abilities of Large Language Models a Mirage? (Rylan Schaeffer)
https://arxiv.org/abs/2304.15004
TOC:
Intro [00:00:00]
Numerai Sponsor Messsage [00:02:17]
AI Alignment [00:04:27]
Limits of AI Capabilities and Physics [00:18:00]
AI Progress and Timelines [00:23:52]
AI Arms Race and Innovation [00:31:11]
Human-Machine Hybrid Intelligence [00:38:30]
Understanding and Defining Intelligence [00:42:48]
AI in Conflict and Cooperation with Humans [00:50:13]
Interpretability and Mind Reading in AI [01:03:46]
Mechanistic Interpretability and Deconfusion Research [01:05:53]
Understanding the core concepts of AI [01:07:40]
Moon landing analogy and AI alignment [01:09:42]
Cognitive horizon and limits of human intelligence [01:11:42]
Funding and focus on AI alignment [01:16:18]
Regulating AI technology and potential risks [01:19:17]
Aligning AI with human values and its dynamic nature [01:27:04]
Cooperation and Allyship [01:29:33]
Orthogonality Thesis and Goal Preservation [01:33:15]
Anthropomorphic Language and Intelligent Agents [01:35:31]
Maintaining Variety and Open-ended Existence [01:36:27]
Emergent Abilities of Large Language Models [01:39:22]
Convergence vs Emergence [01:44:04]
Criticism of X-risk and Alignment Communities [01:49:40]
Fusion of AI communities and addressing biases [01:52:51]
AI systems integration into society and understanding them [01:53:29]
Changing opinions on AI topics and learning from past videos [01:54:23]
Utility functions and von Neumann-Morgenstern theorems [01:54:47]
AI Safety FAQ project [01:58:06]
Building a conversation agent using AI safety dataset [02:00:36]
Support us! https://www.patreon.com/mlst MLST Discord: https://discord.gg/aNPkGUQtc5 Twitter: https://twitter.com/MLStreetTalk
In a historic and candid Senate hearing, OpenAI CEO Sam Altman, Professor Gary Marcus, and IBM's Christina Montgomery discussed the regulatory landscape of AI in the US. The discussion was particularly interesting due to its timing, as it followed the recent release of the EU's proposed AI Act, which could potentially ban American companies like OpenAI and Google from providing API access to generative AI models and impose massive fines for non-compliance.
The speakers openly addressed potential risks of AI technology and emphasized the need for precision regulation. This was a unique approach, as historically, US companies have tried their hardest to avoid regulation. The hearing not only showcased the willingness of industry leaders to engage in discussions on regulation but also demonstrated the need for a balanced approach to avoid stifling innovation.
The EU AI Act, scheduled to come into power in 2026, is still just a proposal, but it has already raised concerns about its impact on the American tech ecosystem and potential conflicts between US and EU laws. With extraterritorial jurisdiction and provisions targeting open-source developers and software distributors like GitHub, the Act could create more problems than it solves by encouraging unsafe AI practices and limiting access to advanced AI technologies.
One core issue with the Act is the designation of foundation models in the highest risk category, primarily due to their open-ended nature. A significant risk theme revolves around users creating harmful content and determining who should be held accountable – the users or the platforms. The Senate hearing served as an essential platform to discuss these pressing concerns and work towards a regulatory framework that promotes both safety and innovation in AI.
00:00 Show
01:35 Legals
03:44 Intro
10:33 Altman intro
14:16 Christina Montgomery
18:20 Gary Marcus
23:15 Jobs
26:01 Scorecards
28:08 Harmful content
29:47 Startups
31:35 What meets the definition of harmful?
32:08 Moratorium
36:11 Social Media
46:17 Gary's take on BingGPT and pivot into policy
48:05 Democratisation
Generative Deep Learning, 2nd Edition [David Foster]
https://www.oreilly.com/library/view/generative-deep-learning/9781098134174/
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
Twitter: https://twitter.com/MLStreetTalk
In this conversation, Tim Scarfe and David Foster, the author of 'Generative Deep Learning,' dive deep into the world of generative AI, discussing topics ranging from model families and auto regressive models to the democratization of AI technology and its potential impact on various industries. They explore the connection between language and true intelligence, as well as the limitations of GPT and other large language models. The discussion also covers the importance of task-independent world models, the concept of active inference, and the potential of combining these ideas with transformer and GPT-style models.
Ethics and regulation in AI development are also discussed, including the need for transparency in data used to train AI models and the responsibility of developers to ensure their creations are not destructive. The conversation touches on the challenges posed by AI-generated content on copyright laws and the diminishing role of effort and skill in copyright due to generative models.
The impact of AI on education and creativity is another key area of discussion, with Tim and David exploring the potential benefits and drawbacks of using AI in the classroom, the need for a balance between traditional learning methods and AI-assisted learning, and the importance of teaching students to use AI tools critically and responsibly.
Generative AI in music is also explored, with David and Tim discussing the potential for AI-generated music to change the way we create and consume art, as well as the challenges in training AI models to generate music that captures human emotions and experiences.
Throughout the conversation, Tim and David touch on the potential risks and consequences of AI becoming too powerful, the importance of maintaining control over the technology, and the possibility of government intervention and regulation. The discussion concludes with a thought experiment about AI predicting human actions and creating transient capabilities that could lead to doom.
TOC:
Introducing Generative Deep Learning [00:00:00]
Model Families in Generative Modeling [00:02:25]
Auto Regressive Models and Recurrence [00:06:26]
Language and True Intelligence [00:15:07]
Language, Reality, and World Models [00:19:10]
AI, Human Experience, and Understanding [00:23:09]
GPTs Limitations and World Modeling [00:27:52]
Task-Independent Modeling and Cybernetic Loop [00:33:55]
Collective Intelligence and Emergence [00:36:01]
Active Inference vs. Reinforcement Learning [00:38:02]
Combining Active Inference with Transformers [00:41:55]
Decentralized AI and Collective Intelligence [00:47:46]
Regulation and Ethics in AI Development [00:53:59]
AI-Generated Content and Copyright Laws [00:57:06]
Effort, Skill, and AI Models in Copyright [00:57:59]
AI Alignment and Scale of AI Models [00:59:51]
Democratization of AI: GPT-3 and GPT-4 [01:03:20]
Context Window Size and Vector Databases [01:10:31]
Attention Mechanisms and Hierarchies [01:15:04]
Benefits and Limitations of Language Models [01:16:04]
AI in Education: Risks and Benefits [01:19:41]
AI Tools and Critical Thinking in the Classroom [01:29:26]
Impact of Language Models on Assessment and Creativity [01:35:09]
Generative AI in Music and Creative Arts [01:47:55]
Challenges and Opportunities in Generative Music [01:52:11]
AI-Generated Music and Human Emotions [01:54:31]
Language Modeling vs. Music Modeling [02:01:58]
Democratization of AI and Industry Impact [02:07:38]
Recursive Self-Improving Superintelligence [02:12:48]
AI Technologies: Positive and Negative Impacts [02:14:44]
Runaway AGI and Control Over AI [02:20:35]
AI Dangers, Cybercrime, and Ethics [02:23:42]
https://www.perplexity.ai/
https://www.perplexity.ai/iphone
https://www.perplexity.ai/android Interview with Aravind Srinivas, CEO and Co-Founder of Perplexity AI – Revolutionizing Learning with Conversational Search Engines Dr. Tim Scarfe talks with Dr. Aravind Srinivas, CEO and Co-Founder of Perplexity AI, about his journey from studying AI and reinforcement learning at UC Berkeley to launching Perplexity – a startup that aims to revolutionize learning through the power of conversational search engines. By combining the strengths of large language models like GPT-* with search engines, Perplexity provides users with direct answers to their questions in a decluttered user interface, making the learning process not only more efficient but also enjoyable. Aravind shares his insights on how advertising can be made more relevant and less intrusive with the help of large language models, emphasizing the importance of transparency in relevance ranking to improve user experience. He also discusses the challenge of balancing the interests of users and advertisers for long-term success. The interview delves into the challenges of maintaining truthfulness and balancing opinions and facts in a world where algorithmic truth is difficult to achieve. Aravind believes that opinionated models can be useful as long as they don't spread misinformation and are transparent about being opinions. He also emphasizes the importance of allowing users to correct or update information, making the platform more adaptable and dynamic. Lastly, Aravind shares his thoughts on embracing a digital society with large language models, stressing the need for frequent and iterative deployments of these models to reduce fear of AI and misinformation. He envisions a future where using AI tools effectively requires clear thinking and first-principle reasoning, ultimately benefiting society as a whole. Education and transparency are crucial to counter potential misuse of AI for political or malicious purposes.
YT version: https://youtu.be/_vMOWw3uYvk Aravind Srinivas: https://www.linkedin.com/in/aravind-srinivas-16051987/
https://scholar.google.com/citations?user=GhrKC1gAAAAJ&hl=en
https://twitter.com/aravsrinivas?lang=en Interviewer: Dr. Tim Scarfe (CTO XRAI Glass) Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB TOC: Introduction and Background of Perplexity AI [00:00:00]
The Importance of a Decluttered UI and User Experience [00:04:19]
Advertising in Search Engines and Potential Improvements [00:09:02]
Challenges and Opportunities in this new Search Modality [00:18:17]
Benefits of Perplexity and Personalized Learning [00:21:27]
Objective Truth and Personalized Wikipedia [00:26:34]
Opinions and Truth in Answer Engines [00:30:53]
Embracing the Digital Society with Language Models [00:37:30]
Impact on Jobs and Future of Learning [00:40:13]
Educating users on when perplexity works and doesn't work [00:43:13]
Improving user experience and the possibilities of voice-to-voice interaction [00:45:04]
The future of language models and auto-regressive models [00:49:51]
Performance of GPT-4 and potential improvements [00:52:31]
Building the ultimate research and knowledge assistant [00:55:33]
Revolutionizing note-taking and personal knowledge stores [00:58:16] References: Evaluating Verifiability in Generative Search Engines (Nelson F. Liu et al, Stanford University) https://arxiv.org/pdf/2304.09848.pdf Note: this was a sponsored interview.
Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB Twitter: https://twitter.com/MLStreetTalk
In this exclusive interview, Dr. Tim Scarfe sits down with Minqi Jiang, a leading PhD student at University College London and Meta AI, as they delve into the fascinating world of deep reinforcement learning (RL) and its impact on technology, startups, and research. Discover how Minqi made the crucial decision to pursue a PhD in this exciting field, and learn from his valuable startup experiences and lessons.
Minqi shares his insights into balancing serendipity and planning in life and research, and explains the role of objectives and Goodhart's Law in decision-making. Get ready to explore the depths of robustness in RL, two-player zero-sum games, and the differences between RL and supervised learning.
As they discuss the role of environment in intelligence, emergence, and abstraction, prepare to be blown away by the possibilities of open-endedness and the intelligence explosion. Learn how language models generate their own training data, the limitations of RL, and the future of software 2.0 with interpretability concerns.
From robotics and open-ended learning applications to learning potential metrics and MDPs, this interview is a goldmine of information for anyone interested in AI, RL, and the cutting edge of technology. Don't miss out on this incredible opportunity to learn from a rising star in the AI world!
TOC
Tech & Startup Background [00:00:00]
Pursuing PhD in Deep RL [00:03:59]
Startup Lessons [00:11:33]
Serendipity vs Planning [00:12:30]
Objectives & Decision Making [00:19:19]
Minimax Regret & Uncertainty [00:22:57]
Robustness in RL & Zero-Sum Games [00:26:14]
RL vs Supervised Learning [00:34:04]
Exploration & Intelligence [00:41:27]
Environment, Emergence, Abstraction [00:46:31]
Open-endedness & Intelligence Explosion [00:54:28]
Language Models & Training Data [01:04:59]
RLHF & Language Models [01:16:37]
Creativity in Language Models [01:27:25]
Limitations of RL [01:40:58]
Software 2.0 & Interpretability [01:45:11]
Language Models & Code Reliability [01:48:23]
Robust Prioritized Level Replay [01:51:42]
Open-ended Learning [01:55:57]
Auto-curriculum & Deep RL [02:08:48]
Robotics & Open-ended Learning [02:31:05]
Learning Potential & MDPs [02:36:20]
Universal Function Space [02:42:02]
Goal-Directed Learning & Auto-Curricula [02:42:48]
Advice & Closing Thoughts [02:44:47]
References:
- Why Greatness Cannot Be Planned: The Myth of the Objective by Kenneth O. Stanley and Joel Lehman
https://www.springer.com/gp/book/9783319155234
- Rethinking Exploration: General Intelligence Requires Rethinking Exploration
https://arxiv.org/abs/2106.06860
- The Case for Strong Emergence (Sabine Hossenfelder)
https://arxiv.org/abs/2102.07740
- The Game of Life (Conway)
https://www.conwaylife.com/
- Toolformer: Teaching Language Models to Generate APIs (Meta AI)
https://arxiv.org/abs/2302.04761
- OpenAI's POET: Paired Open-Ended Trailblazer
https://arxiv.org/abs/1901.01753
- Schmidhuber's Artificial Curiosity
https://people.idsia.ch/~juergen/interest.html
- Gödel Machines
https://people.idsia.ch/~juergen/goedelmachine.html
- PowerPlay
https://arxiv.org/abs/1112.5309
- Robust Prioritized Level Replay: https://openreview.net/forum?id=NfZ6g2OmXEk
- Unsupervised Environment Design: https://arxiv.org/abs/2012.02096
- Excel: Evolving Curriculum Learning for Deep Reinforcement Learning
https://arxiv.org/abs/1901.05431
- Go-Explore: A New Approach for Hard-Exploration Problems
https://arxiv.org/abs/1901.10995
- Learning with AMIGo: Adversarially Motivated Intrinsic Goals
https://www.researchgate.net/publication/342377312_Learning_with_AMIGo_Adversarially_Motivated_Intrinsic_Goals
PRML
https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf
Sutton and Barto
https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/ESrGqhf5CB
Twitter: https://twitter.com/MLStreetTalk
Chris Eliasmith is a renowned interdisciplinary researcher, author, and professor at the University of Waterloo, where he holds the prestigious Canada Research Chair in Theoretical Neuroscience. As the Founding Director of the Centre for Theoretical Neuroscience, Eliasmith leads the Computational Neuroscience Research Group in exploring the mysteries of the brain and its complex functions. His groundbreaking work, including the Neural Engineering Framework, Neural Engineering Objects software environment, and the Semantic Pointer Architecture, has led to the development of Spaun, the most advanced functional brain simulation to date. Among his numerous achievements, Eliasmith has received the 2015 NSERC "Polany-ee" Award and authored two influential books, "How to Build a Brain" and "Neural Engineering."
Chris' homepage:
http://arts.uwaterloo.ca/~celiasmi/
Interviewers: Dr. Tim Scarfe and Dr. Keith Duggar
TOC:
Intro to Chris [00:00:00]
Continuous Representation in Biologically Plausible Neural Networks [00:06:49]
Legendre Memory Unit and Spatial Semantic Pointer [00:14:36]
Large Contexts and Data in Language Models [00:20:30]
Spatial Semantic Pointers and Continuous Representations [00:24:38]
Auto Convolution [00:30:12]
Abstractions and the Continuity [00:36:33]
Compression, Sparsity, and Brain Representations [00:42:52]
Continual Learning and Real-World Interactions [00:48:05]
Robust Generalization in LLMs and Priors [00:56:11]
Chip design [01:00:41]
Chomsky + Computational Power of NNs and Recursion [01:04:02]
Spiking Neural Networks and Applications [01:13:07]
Limits of Empirical Learning [01:22:43]
Philosophy of Mind, Consciousness etc [01:25:35]
Future of human machine interaction [01:41:28]
Future research and advice to young researchers [01:45:06]
Refs:
http://compneuro.uwaterloo.ca/publications/dumont2023.html
http://compneuro.uwaterloo.ca/publications/voelker2019lmu.html
http://compneuro.uwaterloo.ca/publications/voelker2018.html
Support us! https://www.patreon.com/mlst MLST Discord: https://discord.gg/aNPkGUQtc5 In this podcast with the legendary Connor Leahy (CEO Conjecture) recorded in Dec 2022, we discuss various topics related to artificial intelligence (AI), including AI alignment, the success of ChatGPT, the potential threats of artificial general intelligence (AGI), and the challenges of balancing research and product development at his company, Conjecture. He emphasizes the importance of empathy, dehumanizing our thinking to avoid anthropomorphic biases, and the value of real-world experiences in learning and personal growth. The conversation also covers the Orthogonality Thesis, AI preferences, the mystery of mode collapse, and the paradox of AI alignment. Connor Leahy expresses concern about the rapid development of AI and the potential dangers it poses, especially as AI systems become more powerful and integrated into society. He argues that we need a better understanding of AI systems to ensure their safe and beneficial development. The discussion also touches on the concept of "futuristic whack-a-mole," where futurists predict potential AGI threats, and others try to come up with solutions for those specific scenarios. However, the problem lies in the fact that there could be many more scenarios that neither party can think of, especially when dealing with a system that's smarter than humans. https://www.linkedin.com/in/connor-j-leahy/https://twitter.com/NPCollapse Interviewer: Dr. Tim Scarfe (Innovation CTO @ XRAI Glass https://xrai.glass/) TOC: The success of ChatGPT and its impact on the AI field [00:00:00] Subjective experience [00:15:12] AI Architectural discussion including RLHF [00:18:04] The paradox of AI alignment and the future of AI in society [00:31:44] The impact of AI on society and politics [00:36:11] Future shock levels and the challenges of predicting the future [00:45:58] Long termism and existential risk [00:48:23] Consequentialism vs. deontology in rationalism [00:53:39] The Rationalist Community and its Challenges [01:07:37] AI Alignment and Conjecture [01:14:15] Orthogonality Thesis and AI Preferences [01:17:01] Challenges in AI Alignment [01:20:28] Mechanistic Interpretability in Neural Networks [01:24:54] Building Cleaner Neural Networks [01:31:36] Cognitive horizons / The problem with rapid AI development [01:34:52] Founding Conjecture and raising funds [01:39:36] Inefficiencies in the market and seizing opportunities [01:45:38] Charisma, authenticity, and leadership in startups [01:52:13] Autistic culture and empathy [01:55:26] Learning from real-world experiences [02:01:57] Technical empathy and transhumanism [02:07:18] Moral status and the limits of empathy [02:15:33] Anthropomorphic Thinking and Consequentialism [02:17:42] Conjecture: Balancing Research and Product Development [02:20:37] Epistemology Team at Conjecture [02:31:07] Interpretability and Deception in AGI [02:36:23] Futuristic whack-a-mole and predicting AGI threats [02:38:27] Refs: 1. OpenAI's ChatGPT: https://chat.openai.com/ 2. The Mystery of Mode Collapse (Article): https://www.lesswrong.com/posts/t9svvNPNmFf5Qa3TA/mysteries-of-mode-collapse 3. The Rationalist Guide to the Galaxy https://www.amazon.co.uk/Does-Not-Hate-You-Superintelligence/dp/1474608795 5. Alfred Korzybski: https://en.wikipedia.org/wiki/Alfred_Korzybski 6. Instrumental Convergence: https://en.wikipedia.org/wiki/Instrumental_convergence 7. Orthogonality Thesis: https://en.wikipedia.org/wiki/Orthogonality_thesis 8. Brian Tomasik's Essays on Reducing Suffering: https://reducing-suffering.org/ 9. Epistemological Framing for AI Alignment Research: https://www.lesswrong.com/posts/Y4YHTBziAscS5WPN7/epistemological-framing-for-ai-alignment-research 10. How to Defeat Mind readers: https://www.alignmentforum.org/posts/EhAbh2pQoAXkm9yor/circumventing-interpretability-how-to-defeat-mind-readers 11. Society of mind: https://www.amazon.co.uk/Society-Mind-Marvin-Minsky/dp/0671607405
Support us! https://www.patreon.com/mlst MLST Discord: https://discord.gg/aNPkGUQtc5
Send us a voice message which you want us to publish: https://podcasters.spotify.com/pod/show/machinelearningstreettalk/message In a recent open letter, over 1500 individuals called for a six-month pause on the development of advanced AI systems, expressing concerns over the potential risks AI poses to society and humanity. However, there are issues with this approach, including global competition, unstoppable progress, potential benefits, and the need to manage risks instead of avoiding them. Decision theorist Eliezer Yudkowsky took it a step further in a Time magazine article, calling for an indefinite and worldwide moratorium on Artificial General Intelligence (AGI) development, warning of potential catastrophe if AGI exceeds human intelligence. Yudkowsky urged for an immediate halt to all large AI training runs and the shutdown of major GPU clusters, calling for international cooperation to enforce these measures. However, several counterarguments question the validity of Yudkowsky's concerns:
1. Hard limits on AGI 2. Dismissing AI extinction risk 3. Collective action problem 4. Misplaced focus on AI threats While the potential risks of AGI cannot be ignored, it is essential to consider various arguments and potential solutions before making drastic decisions. As AI continues to advance, it is crucial for researchers, policymakers, and society as a whole to engage in open and honest discussions about the potential consequences and the best path forward. With a balanced approach to AGI development, we may be able to harness its power for the betterment of humanity while mitigating its risks. Eliezer Yudkowsky: https://en.wikipedia.org/wiki/Eliezer_Yudkowsky Connor Leahy: https://twitter.com/NPCollapse (we will release that interview soon) Gary Marcus: http://garymarcus.com/index.html Tim Scarfe is the innovation CTO of XRAI Glass: https://xrai.glass/ Gary clip filmed at AIUK https://ai-uk.turing.ac.uk/programme/ and our appreciation to them for giving us a press pass. Check out their conference next year! WIRED clip from Gary came from here: https://www.youtube.com/watch?v=Puo3VkPkNZ4 Refs:
Statement from the listed authors of Stochastic Parrots on the “AI pause” letterTimnit Gebru, Emily M. Bender, Angelina McMillan-Major, Margaret Mitchell
https://www.dair-institute.org/blog/letter-statement-March2023 Eliezer Yudkowsky on Lex: https://www.youtube.com/watch?v=AaTRHFaaPG8 Pause Giant AI Experiments: An Open Letter https://futureoflife.org/open-letter/pause-giant-ai-experiments/ Pausing AI Developments Isn't Enough. We Need to Shut it All Down (Eliezer Yudkowsky) https://time.com/6266923/ai-eliezer-yudkowsky-open-letter-not-enough/
HUGE ANNOUNCEMENT, CHATGPT+WOLFRAM! You saw it HERE first! YT version: https://youtu.be/z5WZhCBRDpU Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5 Stephen's announcement post: https://writings.stephenwolfram.com/2023/03/chatgpt-gets-its-wolfram-superpowers/ OpenAI's announcement post: https://openai.com/blog/chatgpt-plugins In an era of technology and innovation, few individuals have left as indelible a mark on the fabric of modern science as our esteemed guest, Dr. Steven Wolfram. Dr. Wolfram is a renowned polymath who has made significant contributions to the fields of physics, computer science, and mathematics. A prodigious young man too, Wolfram earned a Ph.D. in theoretical physics from the California Institute of Technology by the age of 20. He became the youngest recipient of the prestigious MacArthur Fellowship at the age of 21. Wolfram's groundbreaking computational tool, Mathematica, was launched in 1988 and has become a cornerstone for researchers and innovators worldwide. In 2002, he published "A New Kind of Science," a paradigm-shifting work that explores the foundations of science through the lens of computational systems. In 2009, Wolfram created Wolfram Alpha, a computational knowledge engine utilized by millions of users worldwide. His current focus is on the Wolfram Language, a powerful programming language designed to democratize access to cutting-edge technology. Wolfram's numerous accolades include honorary doctorates and fellowships from prestigious institutions. As an influential thinker, Dr. Wolfram has dedicated his life to unraveling the mysteries of the universe and making computation accessible to all. First of all... we have an announcement to make, you heard it FIRST here on MLST! .... Intro [00:00:00] Big announcement! Wolfram + ChatGPT! [00:02:57] What does it mean to understand? [00:05:33] Feeding information back into the model [00:13:48] Semantics and cognitive categories [00:20:09] Navigating the ruliad [00:23:50] Computational irreducibility [00:31:39] Conceivability and interestingness [00:38:43] Human intelligible sciences [00:43:43]
YT version: https://youtu.be/P1j3VoKBxbc (references in pinned comment) Support us! https://www.patreon.com/mlst MLST Discord: https://discord.gg/aNPkGUQtc5 Dan McQuillan, a visionary in digital culture and social innovation, emphasizes the importance of understanding technology's complex relationship with society. As an academic at Goldsmiths, University of London, he fosters interdisciplinary collaboration and champions data-driven equity and ethical technology. Dan's career includes roles at Amnesty International and Social Innovation Camp, showcasing technology's potential to empower and bring about positive change. In this conversation, we discuss the challenges and opportunities at the intersection of technology and society, exploring the profound impact of our digital world. Interviewer: Dr. Tim Scarfe
[00:00:00] Dan's background and journey to academia
[00:03:30] Dan's background and journey to academia
[00:04:10] Writing the book "Resisting AI"
[00:08:30] Necropolitics and its relation to AI
[00:10:06] AI as a new form of colonization
[00:12:57] LLMs as a new form of neo-techno-imperialism
[00:15:47] Technology for good and AGI's skewed worldview
[00:17:49] Transhumanism, eugenics, and intelligence
[00:20:45] Valuing differences (disability) and challenging societal norms
[00:26:08] Re-ontologizing and the philosophy of information
[00:28:19] New materialism and the impact of technology on society
[00:30:32] Intelligence, meaning, and materiality
[00:31:43] The constraints of physical laws and the importance of science
[00:32:44] Exploring possibilities to reduce suffering and increase well-being
[00:33:29] The division between meaning and material in our experiences
[00:35:36] Machine learning, data science, and neoplatonic approach to understanding reality
[00:37:56] Different understandings of cognition, thought, and consciousness
[00:39:15] Enactivism and its variants in cognitive science
[00:40:58] Jordan Peterson
[00:44:47] Relationism, relativism, and finding the correct relational framework
[00:47:42] Recognizing privilege and its impact on social interactions
[00:49:10] Intersectionality / Feminist thinking and the concept of care in social structures
[00:51:46] Intersectionality and its role in understanding social inequalities
[00:54:26] The entanglement of history, technology, and politics
[00:57:39] ChatGPT article - we come to bury ChatGPT
[00:59:41] Statistical pattern learning and convincing patterns in AI
[01:01:27] Anthropomorphization and understanding in AI
[01:03:26] AI in education and critical thinking
[01:06:09] European Union policies and trustable AI
[01:07:52] AI reliability and the halo effect
[01:09:26] AI as a tool enmeshed in society
[01:13:49] Luddites
[01:15:16] AI is a scam
[01:15:31] AI and Social Relations
[01:16:49] Invisible Labor in AI and Machine Learning
[01:21:09] Exploititative AI / alignment
[01:23:50] Science fiction AI / moral frameworks
[01:27:22] Discussing Stochastic Parrots and Nihilism
[01:30:36] Human Intelligence vs. Language Models
[01:32:22] Image Recognition and Emulation vs. Experience
[01:34:32] Thought Experiments and Philosophy in AI Ethics (mimicry)
[01:41:23] Abstraction, reduction, and grounding in reality
[01:43:13] Process philosophy and the possibility of change
[01:49:55] Mental health, AI, and epistemic injustice
[01:50:30] Hermeneutic injustice and gendered techniques
[01:53:57] AI and politics
[01:59:24] Epistemic injustice and testimonial injustice
[02:11:46] Fascism and AI discussion
[02:13:24] Violence in various systems
[02:16:52] Recognizing systemic violence
[02:22:35] Fascism in Today's Society
[02:33:33] Pace and Scale of Technological Change
[02:37:38] Alternative approaches to AI and society
[02:44:09] Self-Organization at Successive Scales / cybernetics
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
We are honoured to welcome Dr. Joel Lehman, an eminent machine learning research scientist, whose work in AI safety, reinforcement learning, creative open-ended search algorithms, and indeed the philosophy of open-endedness and abandoning objectives has paved the way for innovative ideas that challenge our preconceptions and inspire new visions for the future.
Dr. Lehman's thought-provoking book, "Why Greatness Cannot Be Planned" penned with with our MLST favourite Professor Kenneth Stanley has left an indelible mark on the field and profoundly impacted the way we view innovation and the serendipitous nature of discovery. Those of you who haven't watched our special edition show on that, should do so at your earliest convenience! Building upon this foundation, Dr. Lehman has ventured into the domain of AI systems that embody principles of love, care, responsibility, respect, and knowledge, drawing from the works of Maslow, Erich Fromm, and positive psychology.
YT version: https://youtu.be/23-TXgJEv-Q
http://joellehman.com/
https://twitter.com/joelbot3000
Interviewer: Dr. Tim Scarfe
TOC:
Intro [00:00:00]
Model [00:04:26]
Intro and Paper Intro [00:08:52]
Subjectivity [00:16:07]
Reflections on Greatness Book [00:19:30]
Representing Subjectivity [00:29:24]
Nagal's Bat [00:31:49]
Abstraction [00:38:58]
Love as Action Rather Than Feeling [00:42:58]
Reontologisation [00:57:38]
Self Help [01:04:15]
Meditation [01:09:02]
The Human Reward Function / Effective... [01:16:52]
Machine Hate [01:28:32]
Societal Harms [01:31:41]
Lenses We Use Obscuring Reality [01:56:36]
Meta Optimisation and Evolution [02:03:14]
Conclusion [02:07:06]
References:
What Is It Like to Be a Bat? (Thomas Nagel)
https://warwick.ac.uk/fac/cross_fac/iatl/study/ugmodules/humananimalstudies/lectures/32/nagel_bat.pdf
Why Greatness Cannot Be Planned: The Myth of the Objective (Kenneth O. Stanley and Joel Lehman)
https://link.springer.com/book/10.1007/978-3-319-15524-1
Machine Love (Joel Lehman)
https://arxiv.org/abs/2302.09248
How effective altruists ignored risk (Carla Cremer)
https://www.vox.com/future-perfect/23569519/effective-altrusim-sam-bankman-fried-will-macaskill-ea-risk-decentralization-philanthropy
Philosophy tube - The Rich Have Their Own Ethics: Effective Altruism
https://www.youtube.com/watch?v=Lm0vHQYKI-Y
Abandoning Objectives: Evolution through the Search for Novelty Alone (Joel Lehman and Kenneth O. Stanley)
https://www.cs.swarthmore.edu/~meeden/DevelopmentalRobotics/lehman_ecj11.pdf
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
Dr. Raphaël Millière is the 2020 Robert A. Burt Presidential Scholar in Society and Neuroscience in the Center for Science and Society, and a Lecturer in the Philosophy Department at Columbia University. His research draws from his expertise in philosophy and cognitive science to explore the implications of recent progress in deep learning for models of human cognition, as well as various issues in ethics and aesthetics. He is also investigating what underlies the capacity to represent oneself as oneself at a fundamental level, in humans and non-human animals; as well as the role that self-representation plays in perception, action, and memory. In a world where technology is rapidly advancing, Dr. Millière is striving to gain a better understanding of how artificial neural networks work, and to establish fair and meaningful comparisons between humans and machines in various domains in order to shed light on the implications of artificial intelligence for our lives.
https://www.raphaelmilliere.com/
https://twitter.com/raphaelmilliere
Here is a version with hesitation sounds like "um" removed if you prefer (I didn't notice them personally): https://share.descript.com/view/aGelyTl2xpN
YT: https://www.youtube.com/watch?v=fhn6ZtD6XeE
TOC:
Intro to Raphael [00:00:00]
Intro: Moving Beyond Mimicry in Artificial Intelligence (Raphael Millière) [00:01:18]
Show Kick off [00:07:10]
LLMs [00:08:37]
Semantic Competence/Understanding [00:18:28]
Forming Analogies/JPG Compression Article [00:30:17]
Compositional Generalisation [00:37:28]
Systematicity [00:47:08]
Language of Thought [00:51:28]
Bigbench (Conceptual Combinations) [00:57:37]
Symbol Grounding [01:11:13]
World Models [01:26:43]
Theory of Mind [01:30:57]
Refs (this is truncated, full list on YT video description):
Moving Beyond Mimicry in Artificial Intelligence (Raphael Millière)
https://nautil.us/moving-beyond-mimicry-in-artificial-intelligence-238504/
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜 (Bender et al)
https://dl.acm.org/doi/10.1145/3442188.3445922
ChatGPT Is a Blurry JPEG of the Web (Ted Chiang)
https://www.newyorker.com/tech/annals-of-technology/chatgpt-is-a-blurry-jpeg-of-the-web
The Debate Over Understanding in AI's Large Language Models (Melanie Mitchell)
https://arxiv.org/abs/2210.13966
Talking About Large Language Models (Murray Shanahan)
https://arxiv.org/abs/2212.03551
Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data (Bender)
https://aclanthology.org/2020.acl-main.463/
The symbol grounding problem (Stevan Harnad)
https://arxiv.org/html/cs/9906002
Why the Abstraction and Reasoning Corpus is interesting and important for AI (Mitchell)
https://aiguide.substack.com/p/why-the-abstraction-and-reasoning
Linguistic relativity (Sapir–Whorf hypothesis)
https://en.wikipedia.org/wiki/Linguistic_relativity
Cooperative principle (Grice's four maxims of conversation - quantity, quality, relation, and manner)
https://en.wikipedia.org/wiki/Cooperative_principle
This show is sponsored by Numerai, please visit them here with our sponsor link (we would really appreciate it) http://numer.ai/mlst
Prof. Karl Friston recently proposed a vision of artificial intelligence that goes beyond machines and algorithms, and embraces humans and nature as part of a cyber-physical ecosystem of intelligence. This vision is based on the principle of active inference, which states that intelligent systems can learn from their observations and act on their environment to reduce uncertainty and achieve their goals. This leads to a formal account of collective intelligence that rests on shared narratives and goals.
To realize this vision, Friston suggests developing a shared hyper-spatial modelling language and transaction protocol, as well as novel methods for measuring and optimizing collective intelligence. This could harness the power of artificial intelligence for the common good, without compromising human dignity or autonomy. It also challenges us to rethink our relationship with technology, nature, and each other, and invites us to join a global community of sense-makers who are curious about the world and eager to improve it.
YT version: https://www.youtube.com/watch?v=V_VXOdf1NMw
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
TOC:
Intro [00:00:00]
Numerai (Sponsor segment) [00:07:10]
Designing Ecosystems of Intelligence from First Principles (Friston et al) [00:09:48]
Information / Infosphere and human agency [00:18:30]
Intelligence [00:31:38]
Reductionism [00:39:36]
Universalism [00:44:46]
Emergence [00:54:23]
Markov blankets [01:02:11]
Whole part relationships / structure learning [01:22:33]
Enactivism [01:29:23]
Knowledge and Language [01:43:53]
ChatGPT [01:50:56]
Ethics (is-ought) [02:07:55]
Can people be evil? [02:35:06]
Ethics in Al, subjectiveness [02:39:05]
Final thoughts [02:57:00]
References:
Designing Ecosystems of Intelligence from First Principles (Friston et al)
https://arxiv.org/abs/2212.01354
GLOM - How to represent part-whole hierarchies in a neural network (Hinton)
https://arxiv.org/pdf/2102.12627.pdf
Seven Brief Lessons on Physics (Carlo Rovelli)
https://www.amazon.co.uk/Seven-Brief-Lessons-Physics-Rovelli/dp/0141981725
How Emotions Are Made: The Secret Life of the Brain (Lisa Feldman Barrett)
https://www.amazon.co.uk/How-Emotions-Are-Made-Secret/dp/B01N3D4OON
Am I Self-Conscious? (Or Does Self-Organization Entail Self-Consciousness?) (Karl Friston)
https://www.frontiersin.org/articles/10.3389/fpsyg.2018.00579/full
Integrated information theory (Giulio Tononi)
https://en.wikipedia.org/wiki/Integrated_information_theory
Access Numerai here: http://numer.ai/mlst
Michael Oliver is the Chief Scientist at Numerai, a hedge fund that crowdsources machine learning models from data scientists. He has a PhD in Computational Neuroscience from UC Berkeley and was a postdoctoral researcher at the Allen Institute for Brain Science before joining Numerai in 2020. He is also the host of Numerai Quant Club, a YouTube series where he discusses Numerai’s research, data and challenges.
YT version: https://youtu.be/61s8lLU7sFg
TOC:
[00:00:00] Introduction to Michael and Numerai
[00:02:03] Understanding / new Bing
[00:22:47] Quant vs Neuroscience
[00:36:43] Role of language in cognition and planning, and subjective...
[00:45:47] Boundaries in finance modelling
[00:48:00] Numerai
[00:57:37] Aggregation systems
[01:00:52] Getting started on Numeral
[01:03:21] What models are people using
[01:04:23] Numerai Problem Setup
[01:05:49] Regimes in financial data and quant talk
[01:11:18] Esoteric approaches used on Numeral?
[01:13:59] Curse of dimensionality
[01:16:32] Metrics
[01:19:10] Outro
References:
Growing Neural Cellular Automata (Alexander Mordvintsev)
https://distill.pub/2020/growing-ca/
A Thousand Brains: A New Theory of Intelligence (Jeff Hawkins)
https://www.amazon.fr/Thousand-Brains-New-Theory-Intelligence/dp/1541675819
Perceptual Neuroscience: The Cerebral Cortex (Vernon B. Mountcastle)
https://www.amazon.ca/Perceptual-Neuroscience-Cerebral-Vernon-Mountcastle/dp/0674661885
Numerai Quant Club with Michael Oliver
https://www.youtube.com/watch?v=eLIxarbDXuQ&list=PLz3D6SeXhT3tTu8rhZmjwDZpkKi-UPO1F
Numerai YT channel
https://www.youtube.com/@Numerai/featured
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
Christopher Summerfield, Department of Experimental Psychology, University of Oxford is a Professor of Cognitive Neuroscience at the University of Oxford and a Research Scientist at Deepmind UK. His work focusses on the neural and computational mechanisms by which humans make decisions.
Chris has just released an incredible new book on AI called "Natural General Intelligence". It's my favourite book on AI I have read so so far.
The book explores the algorithms and architectures that are driving progress in AI research, and discusses intelligence in the language of psychology and biology, using examples and analogies to be comprehensible to a wide audience. It also tackles longstanding theoretical questions about the nature of thought and knowledge.
With Chris' permission, I read out a summarised version of Chapter 2 from his book on which was on Intelligence during the 30 minute MLST introduction.
Buy his book here:
https://global.oup.com/academic/product/natural-general-intelligence-9780192843883?cc=gb&lang=en&
YT version: https://youtu.be/31VRbxAl3t0
Interviewer: Dr. Tim Scarfe
TOC:
[00:00:00] Walk and talk with Chris on Knowledge and Abstractions
[00:04:08] Intro to Chris and his book
[00:05:55] (Intro) Tim reads Chapter 2: Intelligence
[00:09:28] Intro continued: Goodhart's law
[00:15:37] Intro continued: The "swiss cheese" situation
[00:20:23] Intro continued: On Human Knowledge
[00:23:37] Intro continued: Neats and Scruffies
[00:30:22] Interview kick off
[00:31:59] What does it mean to understand?
[00:36:18] Aligning our language models
[00:40:17] Creativity
[00:41:40] "Meta" AI and basins of attraction
[00:51:23] What can Neuroscience impart to AI
[00:54:43] Sutton, neats and scruffies and human alignment
[01:02:05] Reward is enough
[01:19:46] Jon Von Neumann and Intelligence
[01:23:56] Compositionality
References:
The Language Game (Morten H. Christiansen, Nick Chater
https://www.penguin.co.uk/books/441689/the-language-game-by-morten-h-christiansen-and--nick-chater/9781787633483
Theory of general factor (Spearman)
https://www.proquest.com/openview/7c2c7dd23910c89e1fc401e8bb37c3d0/1?pq-origsite=gscholar&cbl=1818401
Intelligence Reframed (Howard Gardner)
https://books.google.co.uk/books?hl=en&lr=&id=Qkw4DgAAQBAJ&oi=fnd&pg=PT6&dq=howard+gardner+multiple+intelligences&ots=ERUU0u5Usq&sig=XqiDgNUIkb3K9XBq0vNbFmXWKFs#v=onepage&q=howard%20gardner%20multiple%20intelligences&f=false
The master algorithm (Pedro Domingos)
https://www.amazon.co.uk/Master-Algorithm-Ultimate-Learning-Machine/dp/0241004543
A Thousand Brains: A New Theory of Intelligence (Jeff Hawkins)
https://www.amazon.co.uk/Thousand-Brains-New-Theory-Intelligence/dp/1541675819
The bitter lesson (Rich Sutton)
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
YT: https://youtu.be/i9VPPmQn9HQ
Edward Grefenstette is a Franco-American computer scientist who currently serves as Head of Machine Learning at Cohere and Honorary Professor at UCL. He has previously been a research scientist at Facebook AI Research and staff research scientist at DeepMind, and was also the CTO of Dark Blue Labs. Prior to his move to industry, Edward was a Fulford Junior Research Fellow at Somerville College, University of Oxford, and was lecturing at Hertford College. He obtained his BSc in Physics and Philosophy from the University of Sheffield and did graduate work in the philosophy departments at the University of St Andrews. His research draws on topics and methods from Machine Learning, Computational Linguistics and Quantum Information Theory, and has done work implementing and evaluating compositional vector-based models of natural language semantics and empirical semantic knowledge discovery.
https://www.egrefen.com/
https://cohere.ai/
TOC:
[00:00:00] Introduction
[00:02:52] Differential Semantics
[00:06:56] Concepts
[00:10:20] Ontology
[00:14:02] Pragmatics
[00:16:55] Code helps with language
[00:19:02] Montague
[00:22:13] RLHF
[00:31:54] Swiss cheese problem / retrieval augmented
[00:37:06] Intelligence / Agency
[00:43:33] Creativity
[00:46:41] Common sense
[00:53:46] Thinking vs knowing
References:
Large language models are not zero-shot communicators (Laura Ruis)
https://arxiv.org/abs/2210.14986
Some remarks on Large Language Models (Yoav Goldberg)
https://gist.github.com/yoavg/59d174608e92e845c8994ac2e234c8a9
Quantum Natural Language Processing (Bob Coecke)
https://www.cs.ox.ac.uk/people/bob.coecke/QNLP-ACT.pdf
Constitutional AI: Harmlessness from AI Feedback
https://www.anthropic.com/constitutional.pdf
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Patrick Lewis)
https://www.patricklewis.io/publication/rag/
Natural General Intelligence (Prof. Christopher Summerfield)
https://global.oup.com/academic/product/natural-general-intelligence-9780192843883
ChatGPT with Rob Miles - Computerphile
https://www.youtube.com/watch?v=viJt_DXTfwA
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
YT: https://youtu.be/Vbi288CKgis
Michael Levin is a Distinguished Professor in the Biology department at Tufts University, and the holder of the Vannevar Bush endowed Chair. He is the Director of the Allen Discovery Center at Tufts and the Tufts Center for Regenerative and Developmental Biology. His research focuses on understanding the biophysical mechanisms of pattern regulation and harnessing endogenous bioelectric dynamics for rational control of growth and form.
The capacity to generate a complex, behaving organism from the single cell of a fertilized egg is one of the most amazing aspects of biology. Levin' lab integrates approaches from developmental biology, computer science, and cognitive science to investigate the emergence of form and function. Using biophysical and computational modeling approaches, they seek to understand the collective intelligence of cells, as they navigate physiological, transcriptional, morphognetic, and behavioral spaces. They develop conceptual frameworks for basal cognition and diverse intelligence, including synthetic organisms and AI.
Also joining us this evening is Irina Rish. Irina is a Full Professor at the Université de Montréal's Computer Science and Operations Research department, a core member of Mila - Quebec AI Institute, as well as the holder of the Canada CIFAR AI Chair and the Canadian Excellence Research Chair in Autonomous AI. She has a PhD in AI from UC Irvine. Her research focuses on machine learning, neural data analysis, neuroscience-inspired AI, continual lifelong learning, optimization algorithms, sparse modelling, probabilistic inference, dialog generation, biologically plausible reinforcement learning, and dynamical systems approaches to brain imaging analysis.
Interviewer: Dr. Tim Scarfe
TOC:
[00:00:00] Introduction
[00:02:09] Emergence
[00:13:16] Scaling Laws
[00:23:12] Intelligence
[00:44:36] Transhumanism
Prof. Michael Levin
https://en.wikipedia.org/wiki/Michael_Levin_(biologist)
https://www.drmichaellevin.org/
https://twitter.com/drmichaellevin
Prof. Irina Rish
https://twitter.com/irinarish
https://irina-rish.com/
Dr. Patrick Lewis is a London-based AI and Natural Language Processing Research Scientist, working at co:here. Prior to this, Patrick worked as a research scientist at the Fundamental AI Research Lab (FAIR) at Meta AI. During his PhD, Patrick split his time between FAIR and University College London, working with Sebastian Riedel and Pontus Stenetorp.
Patrick’s research focuses on the intersection of information retrieval techniques (IR) and large language models (LLMs). He has done extensive work on Retrieval-Augmented Language Models. His current focus is on building more powerful, efficient, robust, and update-able models that can perform well on a wide range of NLP tasks, but also excel on knowledge-intensive NLP tasks such as Question Answering and Fact Checking.
YT version: https://youtu.be/Dm5sfALoL1Y
MLST Discord: https://discord.gg/aNPkGUQtc5
Support us! https://www.patreon.com/mlst
References:
Patrick Lewis (Natural Language Processing Research Scientist @ co:here)
https://www.patricklewis.io/
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (Patrick Lewis et al)
https://arxiv.org/abs/2005.11401
Atlas: Few-shot Learning with Retrieval Augmented Language Models (Gautier Izacard, Patrick Lewis, et al)
https://arxiv.org/abs/2208.03299
Improving language models by retrieving from trillions of tokens (RETRO) (Sebastian Borgeaud et al)
https://arxiv.org/abs/2112.04426
YT version (with references): https://www.youtube.com/watch?v=lxaTinmKxs0
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
Carla Cremer and Igor Krawczuk argue that AI risk should be understood as an old problem of politics, power and control with known solutions, and that threat models should be driven by empirical work. The interaction between FTX and the Effective Altruism community has sparked a lot of discussion about the dangers of optimization, and Carla's Vox article highlights the need for an institutional turn when taking on a responsibility like risk management for humanity.
Carla's “Democratizing Risk” paper found that certain types of risks fall through the cracks if they are just categorized into climate change or biological risks. Deliberative democracy has been found to be a better way to make decisions, and AI tools can be used to scale this type of democracy and be used for good, but the transparency of these algorithms to the citizens using the platform must be taken into consideration.
Aggregating people’s diverse ways of thinking about a problem and creating a risk-averse procedure gives a likely, highly probable outcome for having converged on the best policy. There needs to be a good reason to trust one organization with the risk management of humanity and all the different ways of thinking about risk must be taken into account. AI tools can help to scale this type of deliberative democracy, but the transparency of these algorithms must be taken into consideration.
The ambition of the EA community and Altruism Inc. is to protect and do risk management for the whole of humanity and this requires an institutional turn in order to do it effectively. The dangers of optimization are real, and it is essential to ensure that the risk management of humanity is done properly and ethically. By understanding the importance of aggregating people’s diverse ways of thinking about a problem, and creating a risk-averse procedure, it is possible to create a likely, highly probable outcome for having converged on the best policy.
Carla Zoe Cremer
https://carlacremer.github.io/
Igor Krawczuk
https://krawczuk.eu/
Interviewer: Dr. Tim Scarfe
TOC:
[00:00:00] Introduction: Vox article and effective altruism / FTX
[00:11:12] Luciano Floridi on Governance and Risk
[00:15:50] Connor Leahy on alignment
[00:21:08] Ethan Caballero on scaling
[00:23:23] Alignment, Values and politics
[00:30:50] Singularitarians vs AI-thiests
[00:41:56] Consequentialism
[00:46:44] Does scale make a difference?
[00:51:53] Carla's Democratising risk paper
[01:04:03] Vox article - How effective altruists ignored risk
[01:20:18] Does diversity breed complexity?
[01:29:50] Collective rationality
[01:35:16] Closing statements
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
YT version: https://youtu.be/YLNGvvgq3eg
We are living in an age of rapid technological advancement, and with this growth comes a digital divide. Professor Luciano Floridi of the Oxford Internet Institute / Oxford University believes that this divide not only affects our understanding of the implications of this new age, but also the organization of a fair society.
The Information Revolution has been transforming the global economy, with the majority of global GDP now relying on intangible goods, such as information-related services. This in turn has led to the generation of immense amounts of data, more than humanity has ever seen in its history. With 95% of this data being generated by the current generation, Professor Floridi believes that we are becoming overwhelmed by this data, and that our agency as humans is being eroded as a result.
According to Professor Floridi, the digital divide has caused a lack of balance between technological growth and our understanding of this growth. He believes that the infosphere is becoming polluted and the manifold of the infosphere is increasingly determined by technology and AI. Identifying, anticipating and resolving these problems has become essential, and Professor Floridi has dedicated his research to the Philosophy of Information, Philosophy of Technology and Digital Ethics.
We must equip ourselves with a viable philosophy of information to help us better understand and address the risks of this new information age. Professor Floridi is leading the charge, and his research on Digital Ethics, the Philosophy of Information and the Philosophy of Technology is helping us to better anticipate, identify and resolve problems caused by the digital divide.
TOC:
[00:00:00] Introduction to Luciano and his ideas
[00:14:00] Chat GPT / language models
[00:28:45] AI risk / "Singularitarians"
[00:37:15] Forms of governance
[00:43:56] Re-ontologising the world
[00:55:56] It from bit and Computationalism and philosophy without purpose
[01:03:05] Getting into Digital Ethics
Interviewer: Dr. Tim Scarfe
References:
GPT‐3: Its Nature, Scope, Limits, and Consequences [Floridi]
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3827044
Ultraintelligent Machines, Singularity, and Other Sci-fi Distractions about AI [Floridi]
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4222347
The Philosophy of Information [Floridi]
https://www.amazon.co.uk/Philosophy-Information-Luciano-Floridi/dp/0199232393
Information: A Very Short Introduction [Floridi]
https://www.amazon.co.uk/Information-Very-Short-Introduction-Introductions/dp/0199551375
https://en.wikipedia.org/wiki/Luciano_Floridi
https://www.philosophyofinformation.net/
Support us! https://www.patreon.com/mlst
MLST Discord: https://discord.gg/aNPkGUQtc5
YT version: https://youtu.be/YLNGvvgq3eg
(If music annoying, skip to main interview @ 14:14)
We are living in an age of rapid technological advancement, and with this growth comes a digital divide. Professor Luciano Floridi of the Oxford Internet Institute / Oxford University believes that this divide not only affects our understanding of the implications of this new age, but also the organization of a fair society.
The Information Revolution has been transforming the global economy, with the majority of global GDP now relying on intangible goods, such as information-related services. This in turn has led to the generation of immense amounts of data, more than humanity has ever seen in its history. With 95% of this data being generated by the current generation, Professor Floridi believes that we are becoming overwhelmed by this data, and that our agency as humans is being eroded as a result.
According to Professor Floridi, the digital divide has caused a lack of balance between technological growth and our understanding of this growth. He believes that the infosphere is becoming polluted and the manifold of the infosphere is increasingly determined by technology and AI. Identifying, anticipating and resolving these problems has become essential, and Professor Floridi has dedicated his research to the Philosophy of Information, Philosophy of Technology and Digital Ethics.
We must equip ourselves with a viable philosophy of information to help us better understand and address the risks of this new information age. Professor Floridi is leading the charge, and his research on Digital Ethics, the Philosophy of Information and the Philosophy of Technology is helping us to better anticipate, identify and resolve problems caused by the digital divide.
TOC:
[00:00:00] Introduction to Luciano and his ideas
[00:14:40] Chat GPT / language models
[00:29:24] AI risk / "Singularitarians"
[00:30:34] Re-ontologising the world
[00:56:35] It from bit and Computationalism and philosophy without purpose
[01:03:43] Getting into Digital Ethics
References:
GPT‐3: Its Nature, Scope, Limits, and Consequences [Floridi]
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3827044
Ultraintelligent Machines, Singularity, and Other Sci-fi Distractions about AI [Floridi]
https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4222347
The Philosophy of Information [Floridi]
https://www.amazon.co.uk/Philosophy-Information-Luciano-Floridi/dp/0199232393
Information: A Very Short Introduction [Floridi]
https://www.amazon.co.uk/Information-Very-Short-Introduction-Introductions/dp/0199551375
https://en.wikipedia.org/wiki/Luciano_Floridi
https://www.philosophyofinformation.net/
Research has shown that humans possess strong inductive biases which enable them to quickly learn and generalize. In order to instill these same useful human inductive biases into machines, a paper was presented by Sreejan Kumar at the NeurIPS conference which won the Outstanding Paper of the Year award. The paper is called Using Natural Language and Program Abstractions to Instill Human Inductive Biases in Machines.
This paper focuses on using a controlled stimulus space of two-dimensional binary grids to define the space of abstract concepts that humans have and a feedback loop of collaboration between humans and machines to understand the differences in human and machine inductive biases.
It is important to make machines more human-like to collaborate with them and understand their behavior. Synthesised discrete programs running on a turing machine computational model instead of a neural network substrate offers promise for the future of artificial intelligence. Neural networks and program induction should both be explored to get a well-rounded view of intelligence which works in multiple domains, computational substrates and which can acquire a diverse set of capabilities.
Natural language understanding in models can also be improved by instilling human language biases and programs into AI models. Sreejan used an experimental framework consisting of two dual task distributions, one generated from human priors and one from machine priors, to understand the differences in human and machine inductive biases. Furthermore, he demonstrated that compressive abstractions can be used to capture the essential structure of the environment for more human-like behavior. This means that emergent language-based inductive priors can be distilled into artificial neural networks, and AI models can be aligned to the us, world and indeed, our values.
Humans possess strong inductive biases which enable them to quickly learn to perform various tasks. This is in contrast to neural networks, which lack the same inductive biases and struggle to learn them empirically from observational data, thus, they have difficulty generalizing to novel environments due to their lack of prior knowledge.
Sreejan's results showed that when guided with representations from language and programs, the meta-learning agent not only improved performance on task distributions humans are adept at, but also decreased performa on control task distributions where humans perform poorly. This indicates that the abstraction supported by these representations, in the substrate of language or indeed, a program, is key in the development of aligned artificial agents with human-like generalization, capabilities, aligned values and behaviour.
References
Using natural language and program abstractions to instill human inductive biases in machines [Kumar et al/NEURIPS]
https://openreview.net/pdf?id=buXZ7nIqiwE
Core Knowledge [Elizabeth S. Spelke / Harvard]
https://www.harvardlds.org/wp-content/uploads/2017/01/SpelkeKinzler07-1.pdf
The Debate Over Understanding in AI's Large Language Models [Melanie Mitchell]
https://arxiv.org/abs/2210.13966
On the Measure of Intelligence [Francois Chollet]
https://arxiv.org/abs/1911.01547
ARC challenge [Chollet]
https://github.com/fchollet/ARC
Pedro Domingos, Professor Emeritus of Computer Science and Engineering at the University of Washington, is renowned for his research in machine learning, particularly for his work on Markov logic networks that allow for uncertain inference. He is also the author of the acclaimed book "The Master Algorithm".
Panel: Dr. Tim Scarfe
TOC:
[00:00:00] Introduction
[00:01:34] Galaxtica / misinformation / gatekeeping
[00:12:31] Is there a master algorithm?
[00:16:29] Limits of our understanding
[00:21:57] Intentionality, Agency, Creativity
[00:27:56] Compositionality
[00:29:30] Digital Physics / It from bit / Wolfram
[00:35:17] Alignment / Utility functions
[00:43:36] Meritocracy
[00:45:53] Game theory
[01:00:00] EA/consequentialism/Utility
[01:11:09] Emergence / relationalism
[01:19:26] Markov logic
[01:25:38] Moving away from anthropocentrism
[01:28:57] Neurosymbolic / infinity / tensor algerbra
[01:53:45] Abstraction
[01:57:26] Symmetries / Geometric DL
[02:02:46] Bias variance trade off
[02:05:49] What seen at neurips
[02:12:58] Chalmers talk on LLMs
[02:28:32] Definition of intelligence
[02:32:40] LLMs
[02:35:14] On experts in different fields
[02:40:15] Back to intelligence
[02:41:37] Spline theory / extrapolation
YT version: https://www.youtube.com/watch?v=C9BH3F2c0vQ
References;
The Master Algorithm [Domingos]
https://www.amazon.co.uk/s?k=master+algorithm&i=stripbooks&crid=3CJ67DCY96DE8&sprefix=master+algorith%2Cstripbooks%2C82&ref=nb_sb_noss_2
INFORMATION, PHYSICS, QUANTUM: THE SEARCH FOR LINKS [John Wheeler/It from Bit]
https://philpapers.org/archive/WHEIPQ.pdf
A New Kind Of Science [Wolfram]
https://www.amazon.co.uk/New-Kind-Science-Stephen-Wolfram/dp/1579550088
The Rationalist's Guide to the Galaxy: Superintelligent AI and the Geeks Who Are Trying to Save Humanity's Future [Tom Chivers]
https://www.amazon.co.uk/Does-Not-Hate-You-Superintelligence/dp/1474608795
The Status Game: On Social Position and How We Use It [Will Storr]
https://www.goodreads.com/book/show/60598238-the-status-game
Newcomb's paradox
https://en.wikipedia.org/wiki/Newcomb%27s_paradox
The Case for Strong Emergence [Sabine Hossenfelder]
https://philpapers.org/rec/HOSTCF-3
Markov Logic: An Interface Layer for Artificial Intelligence [Domingos]
https://www.morganclaypool.com/doi/abs/10.2200/S00206ED1V01Y200907AIM007
Note; Pedro discussed “Tensor Logic” - I was not able to find a reference
Neural Networks and the Chomsky Hierarchy [Grégoire Delétang/DeepMind]
https://arxiv.org/abs/2207.02098
Connectionism and Cognitive Architecture: A Critical Analysis [Jerry A. Fodor and Zenon W. Pylyshyn]
https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/proseminars/Proseminar13/ConnectionistArchitecture.pdf
Every Model Learned by Gradient Descent Is Approximately a Kernel Machine [Pedro Domingos]
https://arxiv.org/abs/2012.00152
A Path Towards Autonomous Machine Intelligence Version 0.9.2, 2022-06-27 [LeCun]
https://openreview.net/pdf?id=BZ5a1r-kVsf
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges [Michael M. Bronstein, Joan Bruna, Taco Cohen, Petar Veličković]
https://arxiv.org/abs/2104.13478
The Algebraic Mind: Integrating Connectionism and Cognitive Science [Gary Marcus]
https://www.amazon.co.uk/Algebraic-Mind-Integrating-Connectionism-D
Canadian Excellence Research Chair in Autonomous AI. Irina holds an MSc and PhD in AI from the University of California, Irvine as well as an MSc in Applied Mathematics from the Moscow Gubkin Institute. Her research focuses on machine learning, neural data analysis, and neuroscience-inspired AI. In particular, she is exploring continual lifelong learning, optimization algorithms for deep neural networks, sparse modelling and probabilistic inference, dialog generation, biologically plausible reinforcement learning, and dynamical systems approaches to brain imaging analysis. Prof. Rish holds 64 patents, has published over 80 research papers, several book chapters, three edited books, and a monograph on Sparse Modelling. She has served as a Senior Area Chair for NeurIPS and ICML. Irina's research is focussed on taking us closer to the holy grail of Artificial General Intelligence. She continues to push the boundaries of machine learning, continually striving to make advancements in neuroscience-inspired AI.
In a conversation about artificial intelligence (AI), Irina and Tim discussed the idea of transhumanism and the potential for AI to improve human flourishing. Irina suggested that instead of looking at AI as something to be controlled and regulated, people should view it as a tool to augment human capabilities. She argued that attempting to create an AI that is smarter than humans is not the best approach, and that a hybrid of human and AI intelligence is much more beneficial. As an example, she mentioned how technology can be used as an extension of the human mind, to track mental states and improve self-understanding. Ultimately, Irina concluded that transhumanism is about having a symbiotic relationship with technology, which can have a positive effect on both parties.
Tim then discussed the contrasting types of intelligence and how this could lead to something interesting emerging from the combination. He brought up the Trolley Problem and how difficult moral quandaries could be programmed into an AI. Irina then referenced The Garden of Forking Paths, a story which explores the idea of how different paths in life can be taken and how decisions from the past can have an effect on the present.
To better understand AI and intelligence, Irina suggested looking at it from multiple perspectives and understanding the importance of complex systems science in programming and understanding dynamical systems. She discussed the work of Michael Levin, who is looking into reprogramming biological computers with chemical interventions, and Tim mentioned Alex Mordvinsev, who is looking into the self-healing and repair of these systems. Ultimately, Irina argued that the key to understanding AI and intelligence is to recognize the complexity of the systems and to create hybrid models of human and AI intelligence.
Find Irina;
https://mila.quebec/en/person/irina-rish/
https://twitter.com/irinarish
YT version: https://youtu.be/8-ilcF0R7mI
MLST Discord: https://discord.gg/aNPkGUQtc5
References;
The Garden of Forking Paths: Jorge Luis Borges [Jorge Luis Borges]
https://www.amazon.co.uk/Garden-Forking-Paths-Penguin-Modern/dp/0241339057
The Brain from Inside Out [György Buzsáki]
https://www.amazon.co.uk/Brain-Inside-Out-Gy%C3%B6rgy-Buzs%C3%A1ki/dp/0190905387
Growing Isotropic Neural Cellular Automata [Alexander Mordvintsev]
https://arxiv.org/abs/2205.01681
The Extended Mind [Andy Clark and David Chalmers]
https://www.jstor.org/stable/3328150
The Gentle Seduction [Marc Stiegler]
https://www.amazon.co.uk/Gentle-Seduction-Marc-Stiegler/dp/0671698877
Support us! https://www.patreon.com/mlst
Alan Chan is a PhD student at Mila, the Montreal Institute for Learning Algorithms, supervised by Nicolas Le Roux. Before joining Mila, Alan was a Masters student at the Alberta Machine Intelligence Institute and the University of Alberta, where he worked with Martha White. Alan's expertise and research interests encompass value alignment and AI governance. He is currently exploring the measurement of harms from language models and the incentives that agents have to impact the world. Alan's research focuses on understanding and controlling the values expressed by machine learning models. His projects have examined the regulation of explainability in algorithmic systems, scoring rules for performative binary prediction, the effects of global exclusion in AI development, and the role of a graduate student in approaching ethical impacts in AI research. In addition, Alan has conducted research into inverse policy evaluation for value-based sequential decision-making, and the concept of "normal accidents" and AI systems. Alan's research is motivated by the need to align AI systems with human values, and his passion for scientific and governance work in this field. Alan's energy and enthusiasm for his field is infectious.
This was a discussion at NeurIPS. It was in quite a loud environment so the audio quality could have been better.
References:
The Rationalist's Guide to the Galaxy: Superintelligent AI and the Geeks Who Are Trying to Save Humanity's Future [Tim Chivers]
https://www.amazon.co.uk/Does-Not-Hate-You-Superintelligence/dp/1474608795
The implausibility of intelligence explosion [Chollet]
https://medium.com/@francois.chollet/the-impossibility-of-intelligence-explosion-5be4a9eda6ec
Superintelligence: Paths, Dangers, Strategies [Bostrom]
https://www.amazon.co.uk/Superintelligence-Dangers-Strategies-Nick-Bostrom/dp/0199678111
A Theory of Universal Artificial Intelligence based on Algorithmic Complexity [Hutter]
https://arxiv.org/abs/cs/0004001
YT version: https://youtu.be/XBMnOsv9_pk
MLST Discord: https://discord.gg/aNPkGUQtc5
Support us! https://www.patreon.com/mlst
Professor Murray Shanahan is a renowned researcher on sophisticated cognition and its implications for artificial intelligence. His 2016 article ‘Conscious Exotica’ explores the Space of Possible Minds, a concept first proposed by philosopher Aaron Sloman in 1984, which includes all the different forms of minds from those of other animals to those of artificial intelligence. Shanahan rejects the idea of an impenetrable realm of subjective experience and argues that the majority of the space of possible minds may be occupied by non-natural variants, such as the ‘conscious exotica’ of which he speaks. In his paper ‘Talking About Large Language Models’, Shanahan discusses the capabilities and limitations of large language models (LLMs). He argues that prompt engineering is a key element for advanced AI systems, as it involves exploiting prompt prefixes to adjust LLMs to various tasks. However, Shanahan cautions against ascribing human-like characteristics to these systems, as they are fundamentally different and lack a shared comprehension with humans. Even though LLMs can be integrated into embodied systems, it does not mean that they possess human-like language abilities. Ultimately, Shanahan concludes that although LLMs are formidable and versatile, we must be wary of over-simplifying their capacities and limitations.
YT version: https://youtu.be/BqkWpP3uMMU
Full references on the YT description.
[00:00:00] Introduction
[00:08:51] Consciousness and Consciousness Exotica
[00:34:59] Slightly Consciousness LLMs
[00:38:05] Embodiment
[00:51:32] Symbol Grounding
[00:54:13] Emergence
[00:57:09] Reasoning
[01:03:16] Intentional Stance
[01:07:06] Digression on Chomsky show and Andrew Lampinen
[01:10:31] Prompt Engineering
Find Murray online:
https://www.doc.ic.ac.uk/~mpsha/
https://twitter.com/mpshanahan?lang=en
https://scholar.google.co.uk/citations?user=00bnGpAAAAAJ&hl=en
MLST Discord: https://discord.gg/aNPkGUQtc5
Support us! https://www.patreon.com/mlst
Sara Hooker is an exceptionally talented and accomplished leader and research scientist in the field of machine learning. She is the founder of Cohere For AI, a non-profit research lab that seeks to solve complex machine learning problems. She is passionate about creating more points of entry into machine learning research and has dedicated her efforts to understanding how progress in this field can be translated into reliable and accessible machine learning in the real-world.
Sara is also the co-founder of the Trustworthy ML Initiative, a forum and seminar series related to Trustworthy ML. She is on the advisory board of Patterns and is an active member of the MLC research group, which has a focus on making participation in machine learning research more accessible.
Before starting Cohere For AI, Sara worked as a research scientist at Google Brain. She has written several influential research papers, including "The Hardware Lottery", "The Low-Resource Double Bind: An Empirical Study of Pruning for Low-Resource Machine Translation", "Moving Beyond “Algorithmic Bias is a Data Problem”" and "Characterizing and Mitigating Bias in Compact Models".
In addition to her research work, Sara is also the founder of the local Bay Area non-profit Delta Analytics, which works with non-profits and communities all over the world to build technical capacity and empower others to use data. She regularly gives tutorials on machine learning fundamentals, interpretability, model compression and deep neural networks and is dedicated to collaborating with independent researchers around the world.
Sara Hooker is famous for writing a paper introducing the concept of the 'hardware lottery', in which the success of a research idea is determined not by its inherent superiority, but by its compatibility with available software and hardware. She argued that choices about software and hardware have had a substantial impact in deciding the outcomes of early computer science history, and that with the increasing heterogeneity of the hardware landscape, gains from advances in computing may become increasingly disparate. Sara proposed that an interim goal should be to create better feedback mechanisms for researchers to understand how their algorithms interact with the hardware they use. She suggested that domain-specific languages, auto-tuning of algorithmic parameters, and better profiling tools may help to alleviate this issue, as well as provide researchers with more informed opinions about how hardware and software should progress. Ultimately, Sara encouraged researchers to be mindful of the implications of the hardware lottery, as it could mean that progress on some research directions is further obstructed. If you want to learn more about that paper, watch our previous interview with Sara.
YT version: https://youtu.be/7oJui4eSCoY
MLST Discord: https://discord.gg/aNPkGUQtc5
TOC:
[00:00:00] Intro
[00:02:53] Interpretability / Fairness
[00:35:29] LLMs
Find Sara:
https://www.sarahooker.me/
https://twitter.com/sarahookr
Support us! https://www.patreon.com/mlst
Hattie Zhou, a PhD student at Université de Montréal and Mila, has set out to understand and explain the performance of modern neural networks, believing it a key factor in building better, more trusted models. Having previously worked as a data scientist at Uber, a private equity analyst at Radar Capital, and an economic consultant at Cornerstone Research, she has recently released a paper in collaboration with the Google Brain team, titled ‘Teaching Algorithmic Reasoning via In-context Learning’. In this work, Hattie identifies and examines four key stages for successfully teaching algorithmic reasoning to large language models (LLMs): formulating algorithms as skills, teaching multiple skills simultaneously, teaching how to combine skills, and teaching how to use skills as tools. Through the application of algorithmic prompting, Hattie has achieved remarkable results, with an order of magnitude error reduction on some tasks compared to the best available baselines. This breakthrough demonstrates algorithmic prompting’s viability as an approach for teaching algorithmic reasoning to LLMs, and may have implications for other tasks requiring similar reasoning capabilities.
TOC
[00:00:00] Hattie Zhou
[00:19:49] Markus Rabe [Google Brain]
Hattie's Twitter - https://twitter.com/oh_that_hat
Website - http://hattiezhou.com/
Teaching Algorithmic Reasoning via In-context Learning [Hattie Zhou, Azade Nova, Hugo Larochelle, Aaron Courville, Behnam Neyshabur, and Hanie Sedghi]
https://arxiv.org/pdf/2211.09066.pdf
Markus Rabe [Google Brain]:
https://twitter.com/markusnrabe
https://research.google/people/106335/
https://www.linkedin.com/in/markusnrabe
Autoformalization with Large Language Models [Albert Jiang Charles Edgar Staats Christian Szegedy Markus Rabe Mateja Jamnik Wenda Li Yuhuai Tony Wu]
https://research.google/pubs/pub51691/
Discord: https://discord.gg/aNPkGUQtc5
YT: https://youtu.be/80i6D2TJdQ4
Support us! https://www.patreon.com/mlst
(On the main version we released; the music was a tiny bit too loud in places, and some pieces had percussion which was a bit distracting -- here is a version with all music removed so you have the option! )
David Chalmers is a professor of philosophy and neural science at New York University, and an honorary professor of philosophy at the Australian National University. He is the co-director of the Center for Mind, Brain, and Consciousness, as well as the PhilPapers Foundation. His research focuses on the philosophy of mind, especially consciousness, and its connection to fields such as cognitive science, physics, and technology. He also investigates areas such as the philosophy of language, metaphysics, and epistemology. With his impressive breadth of knowledge and experience, David Chalmers is a leader in the philosophical community.
The central challenge for consciousness studies is to explain how something immaterial, subjective, and personal can arise out of something material, objective, and impersonal. This is illustrated by the example of a bat, whose sensory experience is much different from ours, making it difficult to imagine what it's like to be one. Thomas Nagel's "inconceivability argument" has its advantages and disadvantages, but ultimately it is impossible to solve the mind-body problem due to the subjective nature of experience. This is further explored by examining the concept of philosophical zombies, which are physically and behaviorally indistinguishable from conscious humans yet lack conscious experience. This has implications for the Hard Problem of Consciousness, which is the attempt to explain how mental states are linked to neurophysiological activity. The Chinese Room Argument is used as a thought experiment to explain why physicality may be insufficient to be the source of the subjective, coherent experience we call consciousness. Despite much debate, the Hard Problem of Consciousness remains unsolved. Chalmers has been working on a functional approach to decide whether large language models are, or could be conscious.
Filmed at #neurips22
Discord: https://discord.gg/aNPkGUQtc5
Pod: https://anchor.fm/machinelearningstreettalk/episodes/90---Prof--DAVID-CHALMERS---Slightly-Conscious-LLMs-e1sej50
TOC;
[00:00:00] Introduction
[00:00:40] LLMs consciousness pitch
[00:06:33] Philosophical Zombies
[00:09:26] The hard problem of consciousness
[00:11:40] Nagal's bat and intelligibility
[00:21:04] LLM intro clip from NeurIPS
[00:22:55] Connor Leahy on self-awareness in LLMs
[00:23:30] Sneak peek from unreleased show - could consciousness be a submodule?
[00:33:44] SeppH
[00:36:15] Tim interviews David at NeurIPS (functionalism / panpsychism / Searle)
[00:45:20] Peter Hase interviews Chalmers (focus on interpretability/safety)
Panel:
Dr. Tim Scarfe
Dr. Keith Duggar
Contact David;
https://mobile.twitter.com/davidchalmers42
https://consc.net/
References;
Could a Large Language Model Be Conscious? [Chalmers NeurIPS22 talk]
https://nips.cc/media/neurips-2022/Slides/55867.pdf
What Is It Like to Be a Bat? [Nagel]
https://warwick.ac.uk/fac/cross_fac/iatl/study/ugmodules/humananimalstudies/lectures/32/nagel_bat.pdf
Zombies
https://plato.stanford.edu/entries/zombies/
zombies on the web [Chalmers]
https://consc.net/zombies-on-the-web/
The hard problem of consciousness [Chalmers]
https://psycnet.apa.org/record/2007-00485-017
David Chalmers, "Are Large Language Models Sentient?" [NYU talk, same as at NeurIPS]
https://www.youtube.com/watch?v=-BcuCmf00_Y
Support us! https://www.patreon.com/mlst
David Chalmers is a professor of philosophy and neural science at New York University, and an honorary professor of philosophy at the Australian National University. He is the co-director of the Center for Mind, Brain, and Consciousness, as well as the PhilPapers Foundation. His research focuses on the philosophy of mind, especially consciousness, and its connection to fields such as cognitive science, physics, and technology. He also investigates areas such as the philosophy of language, metaphysics, and epistemology. With his impressive breadth of knowledge and experience, David Chalmers is a leader in the philosophical community.
The central challenge for consciousness studies is to explain how something immaterial, subjective, and personal can arise out of something material, objective, and impersonal. This is illustrated by the example of a bat, whose sensory experience is much different from ours, making it difficult to imagine what it's like to be one. Thomas Nagel's "inconceivability argument" has its advantages and disadvantages, but ultimately it is impossible to solve the mind-body problem due to the subjective nature of experience. This is further explored by examining the concept of philosophical zombies, which are physically and behaviorally indistinguishable from conscious humans yet lack conscious experience. This has implications for the Hard Problem of Consciousness, which is the attempt to explain how mental states are linked to neurophysiological activity. The Chinese Room Argument is used as a thought experiment to explain why physicality may be insufficient to be the source of the subjective, coherent experience we call consciousness. Despite much debate, the Hard Problem of Consciousness remains unsolved. Chalmers has been working on a functional approach to decide whether large language models are, or could be conscious.
Filmed at #neurips22
Discord: https://discord.gg/aNPkGUQtc5
YT: https://youtu.be/T7aIxncLuWk
TOC;
[00:00:00] Introduction
[00:00:40] LLMs consciousness pitch
[00:06:33] Philosophical Zombies
[00:09:26] The hard problem of consciousness
[00:11:40] Nagal's bat and intelligibility
[00:21:04] LLM intro clip from NeurIPS
[00:22:55] Connor Leahy on self-awareness in LLMs
[00:23:30] Sneak peek from unreleased show - could consciousness be a submodule?
[00:33:44] SeppH
[00:36:15] Tim interviews David at NeurIPS (functionalism / panpsychism / Searle)
[00:45:20] Peter Hase interviews Chalmers (focus on interpretability/safety)
Panel:
Dr. Tim Scarfe
Dr. Keith Duggar
Contact David;
https://mobile.twitter.com/davidchalmers42
https://consc.net/
References;
Could a Large Language Model Be Conscious? [Chalmers NeurIPS22 talk]
https://nips.cc/media/neurips-2022/Slides/55867.pdf
What Is It Like to Be a Bat? [Nagel]
https://warwick.ac.uk/fac/cross_fac/iatl/study/ugmodules/humananimalstudies/lectures/32/nagel_bat.pdf
Zombies
https://plato.stanford.edu/entries/zombies/
zombies on the web [Chalmers]
https://consc.net/zombies-on-the-web/
The hard problem of consciousness [Chalmers]
https://psycnet.apa.org/record/2007-00485-017
David Chalmers, "Are Large Language Models Sentient?" [NYU talk, same as at NeurIPS]
https://www.youtube.com/watch?v=-BcuCmf00_Y
Support us! https://www.patreon.com/mlst
Dr. Walid Saba recently reviewed the book Machines Will Never Rule The World, which argues that strong AI is impossible. He acknowledges the complexity of modeling mental processes and language, as well as interactive dialogues, and questions the authors' use of "never." Despite his skepticism, he is impressed with recent developments in large language models, though he questions the extent of their success.
We then discussed the successes of cognitive science. Walid believes that something has been achieved which many cognitive scientists would never accept, namely the ability to learn from data empirically. Keith agrees that this is a huge step, but notes that there is still much work to be done to get to the "other 5%" of accuracy. They both agree that the current models are too brittle and require much more data and parameters to get to the desired level of accuracy.
Walid then expresses admiration for deep learning systems' ability to learn non-trivial aspects of language from ingesting text only. He argues that this is an "existential proof" of language competency and that it would be impossible for a group of luminaries such as Montague, Marvin Minsky, John McCarthy, and a thousand other bright engineers to replicate the same level of competency as we have now with LLMs. He then discusses the problem of semantics and pragmatics, as well as symbol grounding, and expresses skepticism about grounded meaning and embodiment. He believes that artificial intelligence should be used to solve real-world problems which require human intelligence but not believe that robots should be built to understand love or other subjective feelings.
We discussed the unique properties of natural human language. Walid believes that the core unique property is the ability to do abductive reasoning, which is the process of reasoning to the best explanation or understanding. Keith adds that there are two types of abduction - one for generating hypotheses and one for justifying them. In both cases, abductive reasoning involves choosing from a set of plausible possibilities.
Finally, we discussed the book "Machines Will Never Rule The World" and its argument that the current mathematics and technology is not enough to model complex systems. Walid agrees with the book's argument but is still optimistic that a new mathematics can be discovered. Keith suggests the possibility of an AGI discovering the mathematics to create itself. They also discussed how the book could serve as a reminder to temper the hype surrounding AI and to focus on exploration, creativity, and daring ideas. Walid ended by stressing the importance of science, noting that engineers should play within the Venn diagrams drawn by scientists, rather than trying to hack their way through it.
Transcript: https://share.descript.com/view/BFQb5iaegJC
Discord: https://discord.gg/aNPkGUQtc5
YT: https://youtu.be/IMnWAuoucjo
TOC:
[00:00:00] Intro
[00:06:52] Walid's change of heart on DL/LLMs and on the skeptics like Gary Marcus
[00:22:52] Symbol Grounding
[00:32:26] On Montague
[00:40:41] On Abduction
[00:50:54] Language of thought
[00:56:08] Why machines will never rule the world book review
[01:20:06] Engineers should play in the scientists Venn Diagram!
Panel;
Dr. Tim Scarfe
Dr. Keith Duggar
Mark Mcguill
Yann LeCun is a French computer scientist known for his pioneering work on convolutional neural networks, optical character recognition and computer vision. He is a Silver Professor at New York University and Vice President, Chief AI Scientist at Meta. Along with Yoshua Bengio and Geoffrey Hinton, he was awarded the 2018 Turing Award for their work on deep learning, earning them the nickname of the "Godfathers of Deep Learning".
Dr. Randall Balestriero has been researching learnable signal processing since 2013, with a focus on learnable parametrized wavelets and deep wavelet transforms. His research has been used by NASA, leading to applications such as Marsquake detection. During his PhD at Rice University, Randall explored deep networks from a theoretical perspective and improved state-of-the-art methods such as batch-normalization and generative networks. Later, when joining Meta AI Research (FAIR) as a postdoc with Prof. Yann LeCun, Randall further broadened his research interests to include self-supervised learning and the biases emerging from data-augmentation and regularization, resulting in numerous publications.
Episode recorded live at NeurIPS.
YT: https://youtu.be/9dLd6n9yT8U (references are there)
Support us! https://www.patreon.com/mlst
Host: Dr. Tim Scarfe
TOC:
[00:00:00] LeCun interview
[00:18:25] Randall Balestriero interview (mostly on spectral SSL paper, first ref)
Dr. Petar Veličković is a Staff Research Scientist at DeepMind, he has firmly established himself as one of the most significant up and coming researchers in the deep learning space. He invented Graph Attention Networks in 2017 and has been a leading light in the field ever since pioneering research in Graph Neural Networks, Geometric Deep Learning and also Neural Algorithmic reasoning. If you haven’t already, you should check out our video on the Geometric Deep learning blueprint, featuring Petar. I caught up with him last week at NeurIPS. In this show, from NeurIPS 2022 we discussed his recent work on category theory and graph neural networks.
https://petar-v.com/
https://twitter.com/PetarV_93/
TOC:
Categories (Cats for AI) [00:00:00]
Reasoning [00:14:44]
Extrapolation [00:19:09]
Ishan Misra Skit [00:27:50]
Graphs (Expander Graph Propagation) [00:29:18]
YT: https://youtu.be/1lkdWduuN14
MLST Discord: https://discord.gg/V25vQeFwhS
Support us! https://www.patreon.com/mlst
References on YT description, lots of them!
Host: Dr. Tim Scarfe
In this NeurIPSs interview, we speak with Laura Ruis about her research on the ability of language models to interpret language in context. She has designed a simple task to evaluate the performance of widely used state-of-the-art language models and has found that they struggle to make pragmatic inferences (implicatures). Tune in to learn more about her findings and what they mean for the future of conversational AI.
Laura Ruis
https://www.lauraruis.com/
https://twitter.com/LauraRuis
BLOOM
https://bigscience.huggingface.co/blog/bloom
Large language models are not zero-shot communicators [Laura Ruis, Akbir Khan, Stella Biderman, Sara Hooker, Tim Rocktäschel, Edward Grefenstette]
https://arxiv.org/abs/2210.14986
[Zhang et al] OPT: Open Pre-trained Transformer Language Models
https://arxiv.org/pdf/2205.01068.pdf
[Lampinen] Can language models handle recursively nested grammatical structures? A case study on comparing models and humans
https://arxiv.org/pdf/2210.15303.pdf
[Gary Marcus] Horse rides astronaut
https://garymarcus.substack.com/p/horse-rides-astronaut
[Gary Marcus] GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about
https://www.technologyreview.com/2020/08/22/1007539/gpt3-openai-language-generator-artificial-intelligence-ai-opinion/
[Bender et al] On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
https://dl.acm.org/doi/10.1145/3442188.3445922
[janus] Simulators (Less Wrong)
https://www.lesswrong.com/posts/vJFdjigzmcXMhNTsx/simulators
First in our unplugged series live from #NeurIPS2022
We discuss natural language understanding, symbol meaning and grounding and Chomsky with Dr. Andrew Lampinen from DeepMind.
We recorded a LOT of material from NeurIPS, keep an eye out for the uploads.
YT version: https://youtu.be/46A-BcBbMnA
References
[Paul Cisek] Beyond the computer metaphor: Behaviour as interaction
https://philpapers.org/rec/CISBTC
Linguistic Competence (Chomsky reference)
https://en.wikipedia.org/wiki/Linguistic_competence
[Andrew Lampinen] Can language models handle recursively nested grammatical structures? A case study on comparing models and humans
https://arxiv.org/abs/2210.15303
[Fodor et al] Connectionism and Cognitive Architecture: A Critical Analysis
https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/proseminars/Proseminar13/ConnectionistArchitecture.pdf
[Melanie Mitchell et al] The Debate Over Understanding in AI's Large Language Models
https://arxiv.org/abs/2210.13966
[Gary Marcus] GPT-3, Bloviator: OpenAI’s language generator has no idea what it’s talking about
https://www.technologyreview.com/2020/08/22/1007539/gpt3-openai-language-generator-artificial-intelligence-ai-opinion/
[Bender et al] On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
https://dl.acm.org/doi/10.1145/3442188.3445922
[Adam Santoro, Andrew Lampinen et al] Symbolic Behaviour in Artificial Intelligence
https://arxiv.org/abs/2102.03406
[Ishita Dasgupta, Lampinen et al] Language models show human-like content effects on reasoning
https://arxiv.org/abs/2207.07051
REACT - Synergizing Reasoning and Acting in Language Models
https://arxiv.org/pdf/2210.03629.pdf
https://ai.googleblog.com/2022/11/react-synergizing-reasoning-and-acting.html
[Fabian Paischer] HELM - History Compression via Language Models in Reinforcement Learning
https://ml-jku.github.io/blog/2022/helm/
https://arxiv.org/abs/2205.12258
[Laura Ruis] Large language models are not zero-shot communicators
https://arxiv.org/pdf/2210.14986.pdf
[Kumar] Using natural language and program abstractions to instill human inductive biases in machines
https://arxiv.org/pdf/2205.11558.pdf
Juho Kim
https://juhokim.com/
AI Helps Ukraine - Charity Conference
A charity conference on AI to raise funds for medical and humanitarian aid for Ukraine
https://aihelpsukraine.cc/
YT version: https://youtu.be/LgwjcqhkOA4
Support us!
https://www.patreon.com/mlst
Dr. Joscha Bach (born 1973 in Weimar, Germany) is a German artificial intelligence researcher and cognitive scientist focusing on cognitive architectures, mental representation, emotion, social modelling, and multi-agent systems.
http://bach.ai/
https://twitter.com/plinz
TOC:
[00:00:00] Ukraine Charity Conference and NeurIPS 2022
[00:03:40] Theory of computation, Godel, Penrose
[00:11:44] Modelling physical reality
[00:15:19] Is our universe infinite?
[00:24:30] Large language models, and on DL / is Gary Marcus hitting a wall?
[00:45:17] Generative models / Codex / Language of thought
[00:58:46] Consciousness (with Friston references)
References:
Am I Self-Conscious? (Or Does Self-Organization Entail Self-Consciousness?) [Friston]
https://www.frontiersin.org/articles/10.3389/fpsyg.2018.00579/full
Impact of Pretraining Term Frequencies on Few-Shot Reasoning [Yasaman Razeghi]
https://arxiv.org/abs/2202.07206
Deep Learning Is Hitting a Wall [Gary Marcus]
https://nautil.us/deep-learning-is-hitting-a-wall-238440/
Turing machines
https://en.wikipedia.org/wiki/Turing_machine
Lambda Calculus
https://en.wikipedia.org/wiki/Lambda_calculus
Godel's incompletness theorem
https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems
Oracle machine
https://en.wikipedia.org/wiki/Oracle_machine
Support us (and please rate on podcast app)
https://www.patreon.com/mlst
In this show tonight with Prof. Julian Togelius (NYU) and Prof. Ken Stanley we discuss open-endedness, AGI, game AI and reinforcement learning.
[Prof Julian Togelius]
https://engineering.nyu.edu/faculty/julian-togelius
https://twitter.com/togelius
[Prof Ken Stanley]
https://www.cs.ucf.edu/~kstanley/
https://twitter.com/kenneth0stanley
TOC:
[00:00:00] Introduction
[00:01:07] AI and computer games
[00:12:23] Intelligence
[00:21:27] Intelligence Explosion
[00:25:37] What should we be aspiring towards?
[00:29:14] Should AI contribute to culture?
[00:32:12] On creativity and open-endedness
[00:36:11] RL overfitting
[00:44:02] Diversity preservation
[00:51:18] Empiricism vs rationalism , in gradient descent the data pushes you around
[00:55:49] Creativity and interestingness (does complexity / information increase)
[01:03:20] What does a population give us?
[01:05:58] Emergence / generalisation snobbery
References;
[Hutter/Legg] Universal Intelligence: A Definition of Machine Intelligence
https://arxiv.org/abs/0712.3329
https://en.wikipedia.org/wiki/Artificial_general_intelligence
https://en.wikipedia.org/wiki/I._J._Good
https://en.wikipedia.org/wiki/G%C3%B6del_machine
[Chollet] Impossibility of intelligence explosion
https://medium.com/@francois.chollet/the-impossibility-of-intelligence-explosion-5be4a9eda6ec
[Alex Irpan] - RL is hard
https://www.alexirpan.com/2018/02/14/rl-hard.html
https://nethackchallenge.com/
Map elites
https://arxiv.org/abs/1504.04909
Covariance Matrix Adaptation for the Rapid Illumination of Behavior Space
https://arxiv.org/abs/1912.02400
[Stanley] - Why greatness cannot be planned
https://www.amazon.com/Why-Greatness-Cannot-Planned-Objective/dp/3319155237
[Lehman/Stanley] Abandoning Objectives: Evolution through the Search for Novelty Alone
https://www.cs.swarthmore.edu/~meeden/DevelopmentalRobotics/lehman_ecj11.pdf
We had a conversation with Aidan Gomez, the CEO of language-based AI platform Cohere. Cohere is a startup which uses artificial intelligence to help users build the next generation of language-based applications. It's headquartered in Toronto. The company has raised $175 million in funding so far.
Language may well become a key new substrate for software building, both in its representation and how we build the software. It may democratise software building so that more people can build software, and we can build new types of software. Aidan and I discuss this in detail in this episode of MLST.
Check out Cohere -- https://dashboard.cohere.ai/welcome/register?utm_source=influencer&utm_medium=social&utm_campaign=mlst
Support us!
https://www.patreon.com/mlst
YT version: https://youtu.be/ooBt_di8DLs
TOC:
[00:00:00] Aidan Gomez intro
[00:02:12] What's it like being a CEO?
[00:02:52] Transformers
[00:09:33] Deepmind Chomsky Hierarchy
[00:14:58] Cohere roadmap
[00:18:18] Friction using LLMs for startups
[00:25:31] How different from OpenAI / GPT-3
[00:29:31] Engineering questions on Cohere
[00:35:13] Francois Chollet says that LLMs are like databases
[00:38:34] Next frontier of language models
[00:42:04] Different modes of understanding in LLMs
[00:47:04] LLMs are the new extended mind
[00:50:03] Is language the next interface, and why might that be bad?
References:
[Balestriero] Spine theory of NNs
https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf
[Delétang et al] Neural Networks and the Chomsky Hierarchy
https://arxiv.org/abs/2207.02098
[Fodor, Pylyshyn] Connectionism and Cognitive Architecture: A Critical Analysis
https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/docs/jaf.pdf
[Chalmers, Clark] The extended mind
https://icds.uoregon.edu/wp-content/uploads/2014/06/Clark-and-Chalmers-The-Extended-Mind.pdf
[Melanie Mitchell et al] The Debate Over Understanding in AI's Large Language Models
https://arxiv.org/abs/2210.13966
[Jay Alammar]
Illustrated stable diffusion
https://jalammar.github.io/illustrated-stable-diffusion/
Illustrated transformer
https://jalammar.github.io/illustrated-transformer/
https://www.youtube.com/channel/UCmOwsoHty5PrmE-3QhUBfPQ
[Sandra Kublik] (works at Cohere!)
https://www.youtube.com/channel/UCjG6QzmabZrBEeGh3vi-wDQ
This video is demonetised on music copyright so we would appreciate support on our Patreon! https://www.patreon.com/mlst
We would also appreciate it if you rated us on your podcast platform.
YT: https://youtu.be/_KVAzAzO5HU
Panel: Dr. Tim Scarfe, Dr. Keith Duggar
Guests: Prof. J. Mark Bishop, Francois Chollet, Prof. David Chalmers, Dr. Joscha Bach, Prof. Karl Friston, Alexander Mattick, Sam Roffey
The Chinese Room Argument was first proposed by philosopher John Searle in 1980. It is an argument against the possibility of artificial intelligence (AI) – that is, the idea that a machine could ever be truly intelligent, as opposed to just imitating intelligence.
The argument goes like this:
Imagine a room in which a person sits at a desk, with a book of rules in front of them. This person does not understand Chinese.
Someone outside the room passes a piece of paper through a slot in the door. On this paper is a Chinese character. The person in the room consults the book of rules and, following these rules, writes down another Chinese character and passes it back out through the slot.
To someone outside the room, it appears that the person in the room is engaging in a conversation in Chinese. In reality, they have no idea what they are doing – they are just following the rules in the book.
The Chinese Room Argument is an argument against the idea that a machine could ever be truly intelligent. It is based on the idea that intelligence requires understanding, and that following rules is not the same as understanding.
in this detailed investigation into the Chinese Room, Consciousness and Syntax vs Semantics, we interview luminaries J.Mark Bishop and Francois Chollet and use unreleased footage from our interviews with David Chalmers, Joscha Bach and Karl Friston. We also cover material from Walid Saba and interview Alex Mattick from Yannic's Discord.
This is probably my favourite ever episode of MLST. I hope you enjoy it! With Keith Duggar.
Note that we are using clips from our unreleased interviews from David Chalmers and Joscha Bach -- we will release those shows properly in the coming weeks. We apologise for delay releasing our backlog, we have been busy building a startup company in the background.
TOC:
[00:00:00] Kick off
[00:00:46] Searle
[00:05:09] Bishop introduces CRA
[00:00:00] Stevan Hardad take on CRA
[00:14:03] Francois Chollet dissects CRA
[00:34:16] Chalmers on consciousness
[00:36:27] Joscha Bach on consciousness
[00:42:01] Bishop introduction
[00:51:51] Karl Friston on consciousness
[00:55:19] Bishop on consciousness and comments on Chalmers
[01:21:37] Private language games (including clip with Sam Roffey)
[01:27:27] Dr. Walid Saba on the chinese room (gofai/systematicity take)
[00:34:36] Bishop: on agency / teleology
[01:36:38] Bishop: back to CRA
[01:40:53] Noam Chomsky on mysteries
[01:45:56] Eric Curiel on math does not represent
[01:48:14] Alexander Mattick on syntax vs semantics
Thanks to: Mark MC on Discord for stimulating conversation, Alexander Mattick, Dr. Keith Duggar, Sam Roffey. Sam's YouTube channel is https://www.youtube.com/channel/UCjRNMsglFYFwNsnOWIOgt1Q
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/ESrGqhf5CB
In this special edition episode, we have a conversation with Prof. Noam Chomsky, the father of modern linguistics and the most important intellectual of the 20th century.
With a career spanning the better part of a century, we took the chance to ask Prof. Chomsky his thoughts not only on the progress of linguistics and cognitive science but also the deepest enduring mysteries of science and philosophy as a whole - exploring what may lie beyond our limits of understanding. We also discuss the rise of connectionism and large language models, our quest to discover an intelligible world, and the boundaries between silicon and biology.
We explore some of the profound misunderstandings of linguistics in general and Chomsky’s own work specifically which have persisted, at the highest levels of academia for over sixty years.
We have produced a significant introduction section where we discuss in detail Yann LeCun’s recent position paper on AGI, a recent paper on emergence in LLMs, empiricism related to cognitive science, cognitive templates, “the ghost in the machine” and language.
Panel:
Dr. Tim Scarfe
Dr. Keith Duggar
Dr. Walid Saba
YT version: https://youtu.be/-9I4SgkHpcA
00:00:00 Kick off
00:02:24 C1: LeCun's recent position paper on AI, JEPA, Schmidhuber, EBMs
00:48:38 C2: Emergent abilities in LLMs paper
00:51:32 C3: Empiricism
01:25:33 C4: Cognitive Templates
01:35:47 C5: The Ghost in the Machine
01:59:21 C6: Connectionism and Cognitive Architecture: A Critical Analysis by Fodor and Pylyshyn
02:19:25 C7: We deep-faked Chomsky
02:29:11 C8: Language
02:34:41 C9: Chomsky interview kick-off!
02:35:39 Large Language Models such as GPT-3
02:39:14 Connectionism and radical empiricism
02:44:44 Hybrid systems such as neurosymbolic
02:48:47 Computationalism silicon vs biological
02:53:28 Limits of human understanding
03:00:46 Semantics state-of-the-art
03:06:43 Universal grammar, I-Language, and language of thought
03:16:27 Profound and enduring misunderstandings
03:25:41 Greatest remaining mysteries science and philosophy
03:33:10 Debrief and 'Chuckles' from Chomsky
Vitaliy Chiley is a Machine Learning Research Engineer at the next-generation computing hardware company Cerebras Systems. We spoke about how DL workloads including sparse workloads can run faster on Cerebras hardware.
[00:00:00] Housekeeping
[00:01:08] Preamble
[00:01:50] Vitaliy Chiley Introduction
[00:03:11] Cerebrus architecture
[00:08:12] Memory management and FLOP utilisation
[00:18:01] Centralised vs decentralised compute architecture
[00:21:12] Sparsity
[00:23:47] Does Sparse NN imply Heterogeneous compute?
[00:29:21] Cost of distributed memory stores?
[00:31:01] Activation vs weight sparsity
[00:37:52] What constitutes a dead weight to be pruned?
[00:39:02] Is it still a saving if we have to choose between weight and activation sparsity?
[00:41:02] Cerebras is a cool place to work
[00:44:05] What is sparsity? Why do we need to start dense?
[00:46:36] Evolutionary algorithms on Cerebras?
[00:47:57] How can we start sparse? Google RIGL
[00:51:44] Inductive priors, why do we need them if we can start sparse?
[00:56:02] Why anthropomorphise inductive priors?
[01:02:13] Could Cerebras run a cyclic computational graph?
[01:03:16] Are NNs locality sensitive hashing tables?
References;
Rigging the Lottery: Making All Tickets Winners [RIGL]
https://arxiv.org/pdf/1911.11134.pdf
[D] DanNet, the CUDA CNN of Dan Ciresan in Jurgen Schmidhuber's team, won 4 image recognition challenges prior to AlexNet
https://www.reddit.com/r/MachineLearning/comments/dwnuwh/d_dannet_the_cuda_cnn_of_dan_ciresan_in_jurgen/
A Spline Theory of Deep Learning [Balestriero]
https://proceedings.mlr.press/v80/balestriero18b.html
Check out Weights and Biases here!
https://wandb.me/MLST
Lukas Biewald is an entrepreneur living in San Francisco. He was the founder and CEO of Figure Eight an Internet company that collects training data for machine learning. In 2018, he founded Weights and Biases, a company that creates developer tools for machine learning. Recently WandB got a cash injection of 15 million dollars in its second funding round.
Lukas has a bachelors and masters in mathematics and computer science respectively from Stanford university. He was a research student under the tutelage of the legendary Daphne Koller.
Lukas Biewald
https://twitter.com/l2k
[00:00:00] Preamble
[00:01:27] Intro to Lukas
[00:02:46] How did Lukas build 2 sucessful startups?
[00:05:49] Rebalancing games with ML
[00:08:14] Elevator pitch for WandB
[00:10:38] Science vs Engineering divide in ML DevOps
[00:14:11] Too much focus on the minutiae?
[00:18:03] Vertical information sharing in large enterprises (metrics)
[00:20:37] Centralised vs Decentralised topology
[00:24:02] Generalisation vs specialisation
[00:28:59] Enhancing explainability
[00:33:14] Should we try and understand "the machine" or is testing / behaviourism enough?
[00:36:55] WandB roadmap
[00:39:06] WandB / ML Ops competitor space?
[00:44:10] How is WandB differentiated over Sagemaker / AzureML
[00:46:02] WandB Sponsorship of ML YT channels
[00:48:43] Alternatives to deep learning?
[00:53:47] How to build a business like WandB
Panel: Tim Scarfe Ph.D and Keith Duggar Ph.D
Note we didn't get paid by Weights and Biases to conduct this interview.
An emergent behavior or emergent property can appear when a number of simple entities operate in an environment, forming more complex behaviours as a collective. If emergence happens over disparate size scales, then the reason is usually a causal relation across different scales. Weak emergence describes new properties arising in systems as a result of the low-level interactions, these might be interactions between components of the system or the components and their environment.
In our epic introduction we focus a lot on the concept of self-organisation, complex systems, cellular automata and strong vs weak emergence. In the main show we discuss this more in detail with Dr. Daniele Grattarola and cover his recent NeurIPS paper on learning graph cellular automata.
YT version: https://youtu.be/MDt2e8XtUcA
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/ESrGqhf5CB
Featuring;
Dr. Daniele Grattarola
Dr. Tim Scarfe
Dr. Keith Duggar
Prof. David Chalmers
Prof. Ken Stanley
Prof. Julian Togelius
Dr. Joscha Bach
David Ha
Dr. Pei Wang
[00:00:00] Special Edition Intro: Emergence and Cellular Automata
[00:49:02] Intro to Daniele and CAs
[00:57:23] Numerical analysis link with CA (PDEs)
[00:59:50] The representational dichotomy of discrete and continuous at different scales
[01:05:21] Universal computation in CAs
[01:10:27] Computational irreducibility
[01:16:33] Is the universe discrete?
[01:20:49] Emergence but with the same computational principle
[01:23:10] How do you formalise the emergent phenomenon
[01:25:44] Growing cellular automata
[01:33:53] Openeded and unbounded computation is required for this kind of behaviour
[01:37:31] Graph cellula automata
[01:43:40] Connection to protein folding
[01:46:24] Are CAs the best tool for the job?
[01:49:37] Where to go to find more information
Please note that in this interview Dr. Lampinen was expressing his personal opinions and they do not necessarily represent those of DeepMind.
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/ESrGqhf5CB
YT version: https://youtu.be/yPMtSXXn4OY
Dr. Andrew Lampinen is a Senior Research Scientist at DeepMind, and he thinks that symbols are subjective in the relativistic sense. Dr. Lampinen completed his PhD in Cognitive Psychology at Stanford University. His background is in mathematics, physics, and machine learning. Andrew has said that his research interests are in cognitive flexibililty and generalization, and how these abilities are enabled by factors like language, memory, and embodiment. Andrew with his coauthors has just released a paper called symbolic behaviour in artificial intelligence. Andrew lead in the paper by saying the human ability to use symbols has yet to be replicated in machines. He thinks that one of the key areas to bridge the gap here is considering how symbol meaning is established, and he strongly believes it is the symbol users themselves who agree upon the symbol meaning, And that the use of symbols entails behaviours which coalesce agreements about their meaning. Which in plain English means that symbols are defined by behaviours rather than their content.
[00:00:00] Intro to Andrew and Symbolic Behaviour paper
[00:07:01] Semantics underpins the unreasonable effectiveness of symbols
[00:12:56] The Depth of Subjectivity
[00:21:03] Walid Saba - universal cognitive templates
[00:27:47] Insufficiently Darwinian
[00:30:52] Discovered vs invented
[00:34:19] Does language have primacy
[00:35:59] Research directions
[00:39:43] Comparison to BenG OpenCog and human compatible AI
[00:42:53] Aligning AI with our culture
[00:47:55] Do we need to model the worst aspects of human behaviour?
[00:50:57] Fairness
[00:54:24] Memorisatation on LLMs
[01:00:38] Wason selection task
[01:03:45] Would an Andrew hashtable robot be intelligent?
Dr. Andrew Lampinen
https://lampinen.github.io/
https://twitter.com/AndrewLampinen
Symbolic Behaviour in Artificial Intelligence
https://arxiv.org/abs/2102.03406
Imitating Interactive Intelligence
https://arxiv.org/abs/2012.05672
https://www.deepmind.com/publications/imitating-interactive-intelligence
Impact of Pretraining Term Frequencies on Few-Shot Reasoning [Yasaman Razeghi]
https://arxiv.org/abs/2202.07206
Big bench dataset
https://github.com/google/BIG-bench
Teaching Autoregressive Language Models Complex Tasks By Demonstration [Recchia]
https://arxiv.org/pdf/2109.02102.pdf
Wason selection task
https://en.wikipedia.org/wiki/Wason_selection_task
Gary Lupyan
https://psych.wisc.edu/staff/lupyan-gary/
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/ESrGqhf5CB
YT version: https://youtu.be/RzGaI7vXrkk
This week we speak with Yasaman Razeghi and Prof. Sameer Singh from UC Urvine. Yasaman recently published a paper called Impact of Pretraining Term Frequencies on Few-Shot Reasoning where she demonstrated comprehensively that large language models only perform well on reasoning tasks because they memorise the dataset. For the first time she showed the accuracy was linearly correlated to the occurance rate in the training corpus, something which OpenAI should have done in the first place!
We also speak with Sameer who has been a pioneering force in the area of machine learning interpretability for many years now, he created LIME with Marco Riberio and also had his hands all over the famous Checklist paper and many others.
We also get into the metric obsession in the NLP world and whether metrics are one of the principle reasons why we are failing to make any progress in NLU.
[00:00:00] Impact of Pretraining Term Frequencies on Few-Shot Reasoning
[00:14:59] Metrics
[00:18:55] Definition of reasoning
[00:25:12] Metrics (again)
[00:28:52] On true believers
[00:33:04] Sameers work on model explainability / LIME
[00:36:58] Computational irreducability
[00:41:07] ML DevOps and Checklist
[00:45:58] Future of ML devops
[00:49:34] Thinking about future
Prof. Sameer Singh
https://sameersingh.org/
Yasaman Razeghi
https://yasamanrazeghi.com/
References;
Impact of Pretraining Term Frequencies on Few-Shot Reasoning [Razeghi et al with Singh]
https://arxiv.org/pdf/2202.07206.pdf
Beyond Accuracy: Behavioral Testing of NLP Models with CheckList [Riberio et al with Singh]
https://arxiv.org/pdf/2005.04118.pdf
“Why Should I Trust You?” Explaining the Predictions of Any Classifier (LIME) [Riberio et al with Singh]
https://arxiv.org/abs/1602.04938
Tim interviewing LIME Creator Marco Ribeiro in 2019
https://www.youtube.com/watch?v=6aUU-Ob4a8I
Tim video on LIME/SHAP on his other channel
https://www.youtube.com/watch?v=jhopjN08lTM
Our interview with Christoph Molar
https://www.youtube.com/watch?v=0LIACHcxpHU
Interpretable Machine Learning book @ChristophMolnar
https://christophm.github.io/interpretable-ml-book/
Machine Teaching: A New Paradigm for Building Machine Learning Systems [Simard]
https://arxiv.org/abs/1707.06742
Whimsical notes on machine teaching
https://whimsical.com/machine-teaching-Ntke9EHHSR25yHnsypHnth
Gopher paper (Deepmind)
https://www.deepmind.com/blog/language-modelling-at-scale-gopher-ethical-considerations-and-retrieval
https://arxiv.org/pdf/2112.11446.pdf
EleutherAI
https://www.eleuther.ai/
https://github.com/kingoflolz/mesh-transformer-jax/
https://pile.eleuther.ai/
A Theory of Universal Artificial Intelligence based on Algorithmic Complexity [Hutter]
https://arxiv.org/pdf/cs/0004001.pdf
YT version: https://youtu.be/DxBZORM9F-8
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/ESrGqhf5CB
Prof. Ken Stanley argued in his book that our world has become saturated with objectives. The process of setting an objective, attempting to achieve it, and measuring progress along the way has become the primary route to achievement in our culture. He’s not saying that objectives are bad per se, especially if they’re modest, but he thinks that when goals are ambitious then the search space becomes deceptive.
Is the key to artificial intelligence really related to intelligence? Does taking a job with a higher salary really bring you closer to being a millionaire? The problem is that the stepping stones which lead to ambitious objectives tend to be pretty strange, they don't resemble the final end state at all. Vaccum tubes led to computers for example and Youtube started as a dating website.
What fascinated us about this conversation with Ken is that we got a much deeper understanding of his philosophy. He lead by saying that he thought it's worth questioning whether artificial intelligence is even a science or not. Ken thinks that the secret to future progress is for us to embrace more subjectivity.
[00:00:00] Tim Intro
[00:12:54] Intro
[00:17:08] Seeing ideas everywhere - AI and art are highly connected
[00:28:40] Creativity in Mathematics
[00:30:14] Where is the intelligence in art?
[00:38:49] Is AI disappointingly simple to mechanise?
[00:42:48] Slightly conscious
[00:46:27] Do we have subjective experience?
[00:50:23] Fear of the unknown
[00:51:48] Free Will
[00:54:22] Chalmers
[00:55:08] What's happening now in open-endedness
[00:58:31] Generalisation
[01:06:34] Representation primitives and what it means to understand
[01:12:37] Appeal to definitions, knowledge itself blocks discovery
Make sure you buy Kenneth's book!
Why Greatness Cannot Be Planned: The Myth of the Objective [Stanley, Lehman]
https://www.amazon.co.uk/Why-Greatness-Cannot-Planned-Objective/dp/3319155237
Abandoning Objectives: Evolution through the
Search for Novelty Alone [Lehman, Stanley]
https://www.cs.swarthmore.edu/~meeden/DevelopmentalRobotics/lehman_ecj11.pdf
https://twitter.com/kenneth0stanley
Special discount link for Zak's GNN course - https://bit.ly/3uqmYVq
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/ESrGqhf5CB
YT version: https://youtu.be/jAGIuobLp60 (there are lots of helper graphics there, recommended if poss)
Want to sponsor MLST!? Let us know on Linkedin / Twitter.
[00:00:00] Preamble
[00:03:12] Geometric deep learning
[00:10:04] Message passing
[00:20:42] Top down vs bottom up
[00:24:59] All NN architectures are different forms of information diffusion processes (squashing and smoothing problem)
[00:29:51] Graph rewiring
[00:31:38] Back to information diffusion
[00:42:43] Transformers vs GNNs
[00:47:10] Equivariant subgraph aggregation networks + WL test
[00:55:36] Do equivariant layers aggregate too?
[00:57:49] Zak's GNN course
Exhaustive list of references on the YT show URL (https://youtu.be/jAGIuobLp60)
Today we are having a discussion with Letitia Parcalabescu from the AI Coffee Break youtube channel! We discuss linguistics, symbolic AI and our respective Youtube channels. Make sure you subscribe to her channel! In the first 15 minutes Tim dissects the recent article from Gary Marcus "Deep learning has hit a wall".
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/ESrGqhf5CB
YT: https://youtu.be/p2D2duT-R2E
[00:00:00] Comments on Gary Marcus Article / Symbolic AI
[00:14:57] Greetings
[00:17:40] Introduction
[00:18:48] A shared journey towards computation
[00:22:10] A linguistics outsider
[00:24:11] Is computational linguistics AI?
[00:28:23] swinging pendulums of dogma and resource allocation
[00:31:16] the road less travelled
[00:34:35] pitching grants with multimodality ... and then the truth
[00:40:50] some aspects of language are statistically learnable
[00:44:58] ... and some aspects of language are dimensionally cursed
[00:48:24] it's good to have both approaches to machine intelligence
[00:51:14] the world runs on symbols
[00:54:28] there is much more to learn biology
[00:59:26] Letitia's creation process
[01:02:23] don't overfit content, instead publish and iterate
[01:07:48] merging the big picture arrow from the small direction arrows
[01:11:02] use passion to drive through failure to success
[01:12:56] stay positive
[01:16:02] closing remarks
Today we are speaking with Dr. Thomas Lux, a research scientist at Meta in Silicon Valley.
In some sense, all of supervised machine learning can be framed through the lens of geometry. All training data exists as points in euclidean space, and we want to predict the value of a function at all those points. Neural networks appear to be the modus operandi these days for many domains of prediction. In that light; we might ask ourselves — what makes neural networks better than classical techniques like K nearest neighbour from a geometric perspective. Our guest today has done research on exactly that problem, trying to define error bounds for approximations in terms of directions, distances, and derivatives.
The insights from Thomas's work point at why neural networks are so good at problems which everything else fails at, like image recognition. The key is in their ability to ignore parts of the input space, do nonlinear dimension reduction, and concentrate their approximation power on important parts of the function.
[00:00:00] Intro to Show
[00:04:11] Intro to Thomas (Main show kick off)
[00:04:56] Interpolation of Sparse High-Dimensional Data
[00:12:19] Where does one place the basis functions to partition the space, the perennial question
[00:16:20] The sampling phenomenon -- where did all those dimensions come from?
[00:17:40] The placement of the MLP basis functions, they are not where you think they are
[00:23:15] NNs only extrapolate when given explicit priors to do so, CNNs in the translation domain
[00:25:31] Transformers extrapolate in the permutation domain
[00:28:26] NN priors work by creating space junk everywhere
[00:36:44] Are vector spaces the way to go? On discrete problems
[00:40:23] Activation functioms
[00:45:57] What can we prove about NNs? Gradients without backprop
Interpolation of Sparse High-Dimensional Data [Lux]
https://tchlux.github.io/papers/tchlux-2020-NUMA.pdf
A Spline Theory of Deep Learning [_Balestriero_]
https://proceedings.mlr.press/v80/balestriero18b.html
Gradients without Backpropagation ‘22
https://arxiv.org/pdf/2202.08587.pdf
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/HNnAwSduud
YT version: https://youtu.be/pMtk-iUaEuQ
Dr. Walid Saba is an old-school polymath. He has a background in cognitive psychology, linguistics, philosophy, computer science and logic and he’s is now a Senior Scientist at Sorcero.
Walid is perhaps the most outspoken critic of BERTOLOGY, which is to say trying to solve the problem of natural language understanding with application of large statistical language models. Walid thinks this approach is cursed to failure because it’s analogous to memorising infinity with a large hashtable. Walid thinks that the various appeals to infinity by some deep learning researchers are risible.
[00:00:00] MLST Housekeeping
[00:08:03] Dr. Walid Saba Intro
[00:11:56] AI Cannot Ignore Symbolic Logic, and Here’s Why
[00:23:39] Main show - Proposition: Statistical learning doesn't work
[01:04:44] Discovering a sorting algorithm bottom-up is hard
[01:17:36] The axioms of nature (universal cognitive templates)
[01:31:06] MLPs are locality sensitive hashing tables
References;
The Missing Text Phenomenon, Again: the case of Compound Nominals
https://ontologik.medium.com/the-missing-text-phenomenon-again-the-case-of-compound-nominals-abb6ece3e205
A Spline Theory of Deep Networks
https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf
The Defeat of the Winograd Schema Challenge
https://arxiv.org/pdf/2201.02387.pdf
Impact of Pretraining Term Frequencies on Few-Shot Reasoning
https://twitter.com/yasaman_razeghi/status/1495112604854882304?s=21
https://arxiv.org/abs/2202.07206
AI Cannot Ignore Symbolic Logic, and Here’s Why
https://medium.com/ontologik/ai-cannot-ignore-symbolic-logic-and-heres-why-1f896713525b
Learnability can be undecidable
http://gtts.ehu.es/German/Docencia/1819/AC/extras/s42256-018-0002-3.pdf
Scaling Language Models: Methods, Analysis & Insights from Training Gopher
https://arxiv.org/pdf/2112.11446.pdf
DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning
https://arxiv.org/abs/2006.08381
On the Measure of Intelligence [Chollet]
https://arxiv.org/abs/1911.01547
A Formal Theory of Commonsense Psychology: How People Think People Think
https://www.amazon.co.uk/Formal-Theory-Commonsense-Psychology-People/dp/1107151007
Continuum hypothesis
https://en.wikipedia.org/wiki/Continuum_hypothesis
Gödel numbering + completness theorems
https://en.wikipedia.org/wiki/G%C3%B6del_numbering
https://en.wikipedia.org/wiki/G%C3%B6del%27s_incompleteness_theorems
Concepts: Where Cognitive Science Went Wrong [Jerry A. Fodor]
https://oxford.universitypressscholarship.com/view/10.1093/0198236360.001.0001/acprof-9780198236368
We engage in a bit of epistemic foraging with Prof. Karl Friston! In this show; we discuss the free energy principle in detail, also emergence, cognition, consciousness and Karl's burden of knowledge!
YT: https://youtu.be/xKQ-F2-o8uM
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/HNnAwSduud
[00:00:00] Introduction to FEP/Friston
[00:06:53] Cheers to Epistemic Foraging!
[00:09:17] The Burden of Knowledge Across Disciplines
[00:12:55] On-show introduction to Friston
[00:14:23] Simple does NOT mean Easy
[00:21:25] Searching for a Mathematics of Cognition
[00:26:44] The Low Road and The High Road to the Principle
[00:28:27] What's changed for the FEP in the last year
[00:39:36] FEP as stochastic systems with a pullback attractor
[00:44:03] An attracting set at multiple time scales and time infinity
[00:53:56] What about fuzzy Markov boundaries?
[00:59:17] Is reality densely or sparsely coupled?
[01:07:00] Is a Strong and Weak Emergence distinction useful?
[01:13:25] a Philosopher, a Zombie, and a Sentient Consciousness walk into a bar ...
[01:24:28] Can we recreate consciousness in silico? Will it have qualia?
[01:28:29] Subjectivity and building hypotheses
[01:34:17] Subject specific realizations to minimize free energy
[01:37:21] Free will in a deterministic Universe
The free energy principle made simpler but not too simple
https://arxiv.org/abs/2201.06387
We have a chat with Alexander Mattick aka ZickZack from Yannic's Discord community. Alex is one of the leading voices in that community and has an impressive technical depth. Don't forget MLST has now started it's own Discord server too, come and join us! We are going to run regular events, our first big event on Wednesday 9th 1700-1900 UK time.
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/HNnAwSduud
YT version: https://youtu.be/rGOOLC8cIO4
[00:00:00] Introduction to Alex
[00:02:16] Spline theory of NNs
[00:05:19] Do NNs abstract?
[00:08:27] Tim's exposition of spline theory of NNs
[00:11:11] Semantics in NNs
[00:13:37] Continuous vs discrete
[00:19:00] Open-ended Search
[00:22:54] Inductive logic programming
[00:25:00] Control to gain knowledge and knowledge to gain control
[00:30:22] Being a generalist with a breadth of knowledge and knowledge transfer
[00:36:29] Causality
[00:43:14] Discrete program synthesis + theorem solvers
Note: there are no politics discussed in this show and please do not interpret this show as any kind of a political statement from us. We have decided not to discuss politics on MLST anymore due to its divisive nature.
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/HNnAwSduud
[00:00:00] Intro
[00:01:36] What we all need to understand about machine learning
[00:06:05] The Master Algorithm Target Audience
[00:09:50] Deeply Connected Algorithms seen from Divergent Frames of Reference
[00:12:49] There is a Master Algorithm; and it's mine!
[00:14:59] The Tribe of Evolution
[00:17:17] Biological Inspirations and Predictive Coding
[00:22:09] Shoe-Horning Gradient Descent
[00:27:12] Sparsity at Training Time vs Prediction Time
[00:30:00] World Models and Predictive Coding
[00:33:24] The Cartoons of System 1 and System 2
[00:40:37] AlphaGo Searching vs Learning
[00:45:56] Discriminative Models evolve into Generative Models
[00:50:36] Generative Models, Predictive Coding, GFlowNets
[00:55:50] Sympathy for a Thousand Brains
[00:59:05] A Spectrum of Tribes
[01:04:29] Causal Structure and Modelling
[01:09:39] Entropy and The Duality of Past vs Future, Knowledge vs Control
[01:16:14] A Discrete Universe?
[01:19:49] And yet continuous models work so well
[01:23:31] Finding a Discretised Theory of Everything
Patreon: https://www.patreon.com/mlst
Discord: https://discord.gg/HNnAwSduud
YT: https://www.youtube.com/watch?v=ZDY2nhkPZxw
We have a chat with Prof. Gary Marcus about everything which is currently top of mind for him, consciousness
[00:00:00] Gary intro
[00:01:25] Slightly conscious
[00:24:59] Abstract, compositional models
[00:32:46] Spline theory of NNs
[00:36:17] Self driving cars / algebraic reasoning
[00:39:43] Extrapolation
[00:44:15] Scaling laws
[00:49:50] Maximum likelihood estimation
References:
Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets
https://arxiv.org/abs/2201.02177
DEEP DOUBLE DESCENT: WHERE BIGGER MODELS AND MORE DATA HURT
https://arxiv.org/pdf/1912.02292.pdf
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
https://arxiv.org/pdf/2002.08791.pdf
We are now sponsored by Weights and Biases! Please visit our sponsor link: http://wandb.me/MLST
Patreon: https://www.patreon.com/mlst
For Yoshua Bengio, GFlowNets are the most exciting thing on the horizon of Machine Learning today. He believes they can solve previously intractable problems and hold the key to unlocking machine abstract reasoning itself. This discussion explores the promise of GFlowNets and the personal journey Prof. Bengio traveled to reach them.
Panel:
Dr. Tim Scarfe
Dr. Keith Duggar
Dr. Yannic Kilcher
Our special thanks to:
- Alexander Mattick (Zickzack)
References:
Yoshua Bengio @ MILA (https://mila.quebec/en/person/bengio-yoshua/)
GFlowNet Foundations (https://arxiv.org/pdf/2111.09266.pdf)
Flow Network based Generative Models for Non-Iterative Diverse Candidate Generation (https://arxiv.org/pdf/2106.04399.pdf)
Interpolation Consistency Training for Semi-Supervised Learning (https://arxiv.org/pdf/1903.03825.pdf)
Towards Causal Representation Learning (https://arxiv.org/pdf/2102.11107.pdf)
Causal inference using invariant prediction: identification and confidence intervals (https://arxiv.org/pdf/1501.01332.pdf)
Dr. Guy Emerson is a computational linguist and obtained his Ph.D from Cambridge university where he is now a research fellow and lecturer. On panel we also have myself, Dr. Tim Scarfe, as well as Dr. Keith Duggar and the veritable Dr. Walid Saba. We dive into distributional semantics, probability theory, fuzzy logic, grounding, vagueness and the grammar/cognition connection.
The aim of distributional semantics is to design computational techniques that can automatically learn the meanings of words from a body of text. The twin challenges are: how do we represent meaning, and how do we learn these representations? We want to learn the meanings of words from a corpus by exploiting the fact that the context of a word tells us something about its meaning. This is known as the distributional hypothesis. In his Ph.D thesis, Dr. Guy Emerson presented a distributional model which can learn truth-conditional semantics which are grounded by objects in the real world.
Hope you enjoy the show!
https://www.cai.cam.ac.uk/people/dr-guy-emerson
https://www.repository.cam.ac.uk/handle/1810/284882?show=full
Patreon: https://www.patreon.com/mlst
We are now sponsored by Weights and Biases! Please visit our sponsor link: http://wandb.me/MLST
Patreon: https://www.patreon.com/mlst
Yann LeCun thinks that it's specious to say neural network models are interpolating because in high dimensions, everything is extrapolation. Recently Dr. Randall Balestriero, Dr. Jerome Pesente and prof. Yann LeCun released their paper learning in high dimensions always amounts to extrapolation. This discussion has completely changed how we think about neural networks and their behaviour.
[00:00:00] Pre-intro
[00:11:58] Intro Part 1: On linearisation in NNs
[00:28:17] Intro Part 2: On interpolation in NNs
[00:47:45] Intro Part 3: On the curse
[00:48:19] LeCun
[01:40:51] Randall B
YouTube version: https://youtu.be/86ib0sfdFtw
Patreon: https://www.patreon.com/mlst
The last decade has witnessed an experimental revolution in data science and machine learning, epitomised by deep learning methods. Many high-dimensional learning tasks previously thought to be beyond reach -- such as computer vision, playing Go, or protein folding -- are in fact tractable given enough computational horsepower. Remarkably, the essence of deep learning is built from two simple algorithmic principles: first, the notion of representation or feature learning and second, learning by local gradient-descent type methods, typically implemented as backpropagation.
While learning generic functions in high dimensions is a cursed estimation problem, most tasks of interest are not uniform and have strong repeating patterns as a result of the low-dimensionality and structure of the physical world.
Geometric Deep Learning unifies a broad class of ML problems from the perspectives of symmetry and invariance. These principles not only underlie the breakthrough performance of convolutional neural networks and the recent success of graph neural networks but also provide a principled way to construct new types of problem-specific inductive biases.
This week we spoke with Professor Michael Bronstein (head of graph ML at Twitter) and Dr.
Petar Veličković (Senior Research Scientist at DeepMind), and Dr. Taco Cohen and Prof. Joan Bruna about their new proto-book Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges.
See the table of contents for this (long) show at https://youtu.be/bIZB1hIJ4u8
Patreon: https://www.patreon.com/mlst
The ultimate goal of neuroscience is to learn how the human brain gives rise to human intelligence and what it means to be intelligent. Understanding how the brain works is considered one of humanity’s greatest challenges.
Jeff Hawkins thinks that the reality we perceive is a kind of simulation, a hallucination, a confabulation. He thinks that our brains are a model reality based on thousands of information streams originating from the sensors in our body. Critically - Hawkins doesn’t think there is just one model but rather; thousands.
Jeff has just released his new book, A thousand brains: a new theory of intelligence. It’s an inspiring and well-written book and I hope after watching this show; you will be inspired to read it too.
https://numenta.com/a-thousand-brains-by-jeff-hawkins/
https://numenta.com/blog/2019/01/16/the-thousand-brains-theory-of-intelligence/
Panel:
Dr. Keith Duggar https://twitter.com/DoctorDuggar
Connor Leahy https://twitter.com/npcollapse
The field of Artificial Intelligence was founded in the mid 1950s with the aim of constructing “thinking machines” - that is to say, computer systems with human-like general intelligence. Think of humanoid robots that not only look but act and think with intelligence equal to and ultimately greater than that of human beings. But in the intervening years, the field has drifted far from its ambitious old-fashioned roots.
Dr. Ben Goertzel is an artificial intelligence researcher, CEO and founder of SingularityNET. A project combining artificial intelligence and blockchain to democratize access to artificial intelligence. Ben seeks to fulfil the original ambitions of the field. Ben graduated with a PhD in Mathematics from Temple University in 1990. Ben’s approach to AGI over many decades now has been inspired by many disciplines, but in particular from human cognitive psychology and computer science perspective. To date Ben’s work has been mostly theoretically-driven. Ben thinks that most of the deep learning approaches to AGI today try to model the brain. They may have a loose analogy to human neuroscience but they have not tried to derive the details of an AGI architecture from an overall conception of what a mind is. Ben thinks that what matters for creating human-level (or greater) intelligence is having the right information processing architecture, not the underlying mechanics via which the architecture is implemented.
Ben thinks that there is a certain set of key cognitive processes and interactions that AGI systems must implement explicitly such as; working and long-term memory, deliberative and reactive processing, perc biological systems tend to be messy, complex and integrative; searching for a single “algorithm of general intelligence” is an inappropriate attempt to project the aesthetics of physics or theoretical computer science into a qualitatively different domain.
TOC is on the YT show description https://www.youtube.com/watch?v=sw8IE3MX1SY
Panel: Dr. Tim Scarfe, Dr. Yannic Kilcher, Dr. Keith Duggar
Artificial General Intelligence: Concept, State of the Art, and Future Prospects
https://sciendo.com/abstract/journals...
The General Theory of General Intelligence: A Pragmatic Patternist Perspective
https://arxiv.org/abs/2103.15100
Since its beginning in the 1950s, the field of artificial intelligence has vacillated between periods of optimistic predictions and massive investment and periods of disappointment, loss of confidence, and reduced funding. Even with today’s seemingly fast pace of AI breakthroughs, the development of long-promised technologies such as self-driving cars, housekeeping robots, and conversational companions has turned out to be much harder than many people expected. Professor Melanie Mitchell thinks one reason for these repeating cycles is our limited understanding of the nature and complexity of intelligence itself.
YT vid- https://www.youtube.com/watch?v=A8m1Oqz2HKc
Main show kick off [00:26:51]
Panel: Dr. Tim Scarfe, Dr. Keith Duggar, Letitia Parcalabescu (https://www.youtube.com/c/AICoffeeBreak/)
It has been over three decades since the statistical revolution overtook AI by a storm and over two decades since deep learning (DL) helped usher the latest resurgence of artificial intelligence (AI). However, the disappointing progress in conversational agents, NLU, and self-driving cars, has made it clear that progress has not lived up to the promise of these empirical and data-driven methods. DARPA has suggested that it is time for a third wave in AI, one that would be characterized by hybrid models – models that combine knowledge-based approaches with data-driven machine learning techniques.
Joining us on this panel discussion is polymath and linguist Walid Saba - Co-founder ONTOLOGIK.AI, Gadi Singer - VP & Director, Cognitive Computing Research, Intel Labs and J. Mark Bishop - Professor of Cognitive Computing (Emeritus), Goldsmiths, University of London and Scientific Adviser to FACT360.
Moderated by Dr. Keith Duggar and Dr. Tim Scarfe
https://www.linkedin.com/in/gadi-singer/
https://www.linkedin.com/in/walidsaba/
https://www.linkedin.com/in/profjmarkbishop/
#machinelearning #artificialintelligence
Dr. Ishan Misra is a Research Scientist at Facebook AI Research where he works on Computer Vision and Machine Learning. His main research interest is reducing the need for human supervision, and indeed, human knowledge in visual learning systems. He finished his PhD at the Robotics Institute at Carnegie Mellon. He has done stints at Microsoft Research, INRIA and Yale. His bachelors is in computer science where he achieved the highest GPA in his cohort.
Ishan is fast becoming a prolific scientist, already with more than 3000 citations under his belt and co-authoring with Yann LeCun; the godfather of deep learning. Today though we will be focusing an exciting cluster of recent papers around unsupervised representation learning for computer vision released from FAIR. These are; DINO: Emerging Properties in Self-Supervised Vision Transformers, BARLOW TWINS: Self-Supervised Learning via Redundancy Reduction and PAWS: Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with
Support Samples. All of these papers are hot off the press, just being officially released in the last month or so. Many of you will remember PIRL: Self-Supervised Learning of Pretext-Invariant Representations which Ishan was the primary author of in 2019.
References;
Shuffle and Learn - https://arxiv.org/abs/1603.08561
DepthContrast - https://arxiv.org/abs/2101.02691
DINO - https://arxiv.org/abs/2104.14294
Barlow Twins - https://arxiv.org/abs/2103.03230
SwAV - https://arxiv.org/abs/2006.09882
PIRL - https://arxiv.org/abs/1912.01991
AVID - https://arxiv.org/abs/2004.12943 (best paper candidate at CVPR'21 (just announced over the weekend) - http://cvpr2021.thecvf.com/node/290)
Alexei (Alyosha) Efros
http://people.eecs.berkeley.edu/~efros/
http://www.cs.cmu.edu/~tmalisie/projects/nips09/
Exemplar networks
https://arxiv.org/abs/1406.6909
The bitter lesson - Rich Sutton
http://www.incompleteideas.net/IncIdeas/BitterLesson.html
Machine Teaching: A New Paradigm for Building Machine Learning Systems
https://arxiv.org/abs/1707.06742
POET
https://arxiv.org/pdf/1901.01753.pdf
Professor Gary Marcus is a scientist, best-selling author, and entrepreneur. He is Founder and CEO of Robust.AI, and was Founder and CEO of Geometric Intelligence, a machine learning company acquired by Uber in 2016. Gary said in his recent next decade paper that — without us, or other creatures like us, the world would continue to exist, but it would not be described, distilled, or understood. Human lives are filled with abstraction and causal description. This is so powerful. Francois Chollet the other week said that intelligence is literally sensitivity to abstract analogies, and that is all there is to it. It's almost as if one of the most important features of intelligence is to be able to abstract knowledge, this drives the generalisation which will allow you to mine previous experience to make sense of many future novel situations. Also joining us today is Professor Luis Lamb — Secretary of Innovation for Science and Technology of the State of Rio Grande do Sul, Brazil. His Research Interests are Machine Learning and Reasoning, Neuro-Symbolic Computing, Logic in Computation and Artificial Intelligence, Cognitive and Neural Computation and also AI Ethics and Social Computing. Luis released his new paper Neurosymbolic AI: the third wave at the end of last year. It beautifully articulated the key ingredients needed in the next generation of AI systems, integrating type 1 and type 2 approaches to AI and it summarises all the of the achievements of the last 20 years of research. We cover a lot of ground in today's show. Explaining the limitations of deep learning, Rich Sutton's the bitter lesson and "reward is enough", and the semantic foundation which is required for us to build robust AI.
Bob Coercke is a celebrated physicist, he's been a Physics and Quantum professor at Oxford University for the last 20 years. He is particularly interested in Structure which is to say, Logic, Order, and Category Theory. He is well known for work involving compositional distributional models of natural language meaning and he is also fascinated with understanding how our brains work. Bob was recently appointed as the Chief Scientist at Cambridge Quantum Computing.
Bob thinks that interactions between systems in Quantum Mechanics carries naturally over to how word meanings interact in natural language. Bob argues that this interaction embodies the phenomenon of quantum teleportation.
Bob invented ZX-calculus, a graphical calculus for revealing the compositional structure inside quantum circuits - to show entanglement states and protocols in a visually succinct but logically complete way. Von Neumann himself didn't even like his own original symbolic formalism of quantum theory, despite it being widely used!
We hope you enjoy this fascinating conversation which might give you a lot of insight into natural language processing.
Tim Intro [00:00:00]
The topological brain (Post-record button skit) [00:13:22]
Show kick off [00:19:31]
Bob introduction [00:22:37]
Changing culture in universities [00:24:51]
Machine Learning is like electricity [00:31:50]
NLP -- what is Bob's Quantum conception? [00:34:50]
The missing text problem [00:52:59]
Can statistical induction be trusted? [00:59:49]
On pragmatism and hybrid systems [01:04:42]
Parlour tricks, parsing and information flows [01:07:43]
How much human input is required with Bob's method? [01:11:29]
Reality, meaning, structure and language [01:14:42]
Replacing complexity with quantum entanglement, emergent complexity [01:17:45]
Loading quantum data requires machine learning [01:19:49]
QC is happy math coincidence for NLP [01:22:30]
The Theory of English (ToE) [01:28:23]
... or can we learn the ToE? [01:29:56]
How did diagrammatic quantum calculus come about? [01:31:04
The state of quantum computing today [01:37:49]
NLP on QC might be doable even in the NISQ era [01:40:48]
Hype and private investment are driving progress [01:48:34]
Crypto discussion (moved to post-show) [01:50:38]
Kilcher is in a startup (moved to post show) [01:53:40
Debrief [01:55:26]
Performing reliably on unseen or shifting data distributions is a difficult challenge for modern vision systems, even slight corruptions or transformations of images are enough to slash the accuracy of state-of-the-art classifiers. When an adversary is allowed to modify an input image directly, models can be manipulated into predicting anything even when there is no perceptible change, this is known an adversarial example. The ideal definition of an adversarial example is when humans consistently say two pictures are the same but a machine disagrees. Hadi Salman, a Ph.D student at MIT (ex-Uber and Microsoft Research) started thinking about how adversarial robustness could be leveraged beyond security.
He realised that the phenomenon of adversarial examples could actually be turned upside down to lead to more robust models instead of breaking them. Hadi actually utilized the brittleness of neural networks to design unadversarial examples or robust objects which_ are objects designed specifically to be robustly recognized by neural networks.
Introduction [00:00:00]
DR KILCHER'S PHD HAT [00:11:18]
Main Introduction [00:11:38]
Hadi's Introduction [00:14:43]
More robust models == transfer better [00:46:41]
Features not bugs paper [00:49:13]
Manifolds [00:55:51]
Robustness and Transferability [00:58:00]
Do non-robust features generalize worse than robust? [00:59:52]
The unreasonable predicament of entangled features [01:01:57]
We can only find adversarial examples in the vicinity [01:09:30]
Certifiability of models for robustness [01:13:55]
Carlini is coming for you! And we are screwed [01:23:21]
Distribution shift and corruptions are a bigger problem than adversarial examples [01:25:34]
All roads lead to generalization [01:26:47]
Unadversarial examples [01:27:26]
In today's show we are joined by Francois Chollet, I have been inspired by Francois ever since I read his Deep Learning with Python book and started using the Keras library which he invented many, many years ago. Francois has a clarity of thought that I've never seen in any other human being! He has extremely interesting views on intelligence as generalisation, abstraction and an information conversation ratio. He wrote on the measure of intelligence at the end of 2019 and it had a huge impact on my thinking. He thinks that NNs can only model continuous problems, which have a smooth learnable manifold and that many "type 2" problems which involve reasoning and/or planning are not suitable for NNs. He thinks that many problems have type 1 and type 2 enmeshed together. He thinks that the future of AI must include program synthesis to allow us to generalise broadly from a few examples, but the search could be guided by neural networks because the search space is interpolative to some extent.
https://youtu.be/J0p_thJJnoo
Tim's Whimsical notes; https://whimsical.com/chollet-show-QQ2atZUoRR3yFDsxKVzCbj
Dr. Christian Szegedy from Google Research is a deep learning heavyweight. He invented adversarial examples, one of the first object detection algorithms, the inceptionnet architecture, and co-invented batchnorm. He thinks that if you bet on computers and software in 1990 you would have been as right as if you bet on AI now. But he thinks that we have been programming computers the same way since the 1950s and there has been a huge stagnation ever since. Mathematics is the process of taking a fuzzy thought and formalising it. But could we automate that? Could we create a system which will act like a super human mathematician but you can talk to it in natural language? This is what Christian calls autoformalisation. Christian thinks that automating many of the things we do in mathematics is the first step towards software synthesis and building human-level AGI. Mathematics ability is the litmus test for general reasoning ability. Christian has a fascinating take on transformers too.
With Yannic Lightspeed Kilcher and Dr. Mathew Salvaris
Whimsical Canvas with Tim's Notes:
https://whimsical.com/mar-26th-christian-szegedy-CpgGhnEYDBrDMFoATU6XYC
YouTube version (with detailed table of contents) https://youtu.be/ehNGGYFO6ms
The race is on, we are on a collective mission to understand and create artificial general intelligence. Dr. Tom Zahavy, a Research Scientist at DeepMind thinks that reinforcement learning is the most general learning framework that we have today, and in his opinion it could lead to artificial general intelligence. He thinks there are no tasks which could not be solved by simply maximising a reward.
Back in 2012 when Tom was an undergraduate, before the deep learning revolution he attended an online lecture on how CNNs automatically discover representations. This was an epiphany for Tom. He decided in that very moment that he was going to become an ML researcher. Tom's view is that the ability to recognise patterns and discover structure is the most important aspect of intelligence. This has been his quest ever since. He is particularly focused on using diversity preservation and metagradients to discover this structure.
In this discussion we dive deep into meta gradients in reinforcement learning.
Video version and TOC @ https://www.youtube.com/watch?v=hfaZwgk_iS0
First episode in a series we are doing on ML DevOps. Starting with the thing which nobody seems to be talking about enough, security! We chat with cyber security expert Andy Smith about threat modelling and trust boundaries for an ML DevOps system.
Intro [00:00:00]
ML DevOps - a security perspective [00:00:50]
Threat Modelling [00:03:03]
Adversarial examples? [00:11:27]
Nobody understands the whole stack [00:13:53]
On the size of the state space, the element of unpredictability [00:18:32]
Threat modelling in more detail [00:21:17]
Trust boundaries for an ML DevOps system [00:25:45]
Andy has a YouTube channel on cyber security! Check it out @
https://www.youtube.com/channel/UCywP24ly6h6NTusX88TQKTQ
https://www.linkedin.com/in/andysmith-uk/
Video version:
https://youtu.be/7Tz-3S4lypI
Christoph Molnar is one of the main people to know in the space of interpretable ML. In 2018 he released the first version of his incredible online book, interpretable machine learning. Interpretability is often a deciding factor when a machine learning (ML) model is used in a product, a decision process, or in research. Interpretability methods can be used to discover knowledge, to debug or justify the model and its predictions, and to control and improve the model, reason about potential bias in models as well as increase the social acceptance of models. But Interpretability methods can also be quite esoteric, add an additional layer of complexity and potential pitfalls and requires expert knowledge to understand. Is it even possible to understand complex models or even humans for that matter in any meaningful way?
Introduction to IML [00:00:00]
Show Kickoff [00:13:28]
What makes a good explanation? [00:15:51]
Quantification of how good an explanation is [00:19:59]
Knowledge of the pitfalls of IML [00:22:14]
Are linear models even interpretable? [00:24:26]
Complex Math models to explain Complex Math models? [00:27:04]
Saliency maps are glorified edge detectors [00:28:35]
Challenge on IML -- feature dependence [00:36:46]
Don't leap to using a complex model! Surrogate models can be too dumb [00:40:52]
On airplane pilots. Seeking to understand vs testing [00:44:09]
IML Could help us make better models or lead a better life [00:51:53]
Lack of statistical rigor and quantification of uncertainty [00:55:35]
On Causality [01:01:09]
Broadening out the discussion to the process or institutional level [01:08:53]
No focus on fairness / ethics? [01:11:44]
Is it possible to condition ML model training on IML metrics ? [01:15:27]
Where is IML going? Some of the esoterica of the IML methods [01:18:35]
You can't compress information without common knowledge, the latter becomes the bottleneck [01:23:25]
IML methods used non-interactively? Making IML an engineering discipline [01:31:10]
Tim Postscript -- on the lack of effective corporate operating models for IML, security, engineering and ethics [01:36:34]
Explanation in Artificial Intelligence: Insights from the Social Sciences (Tim Miller 2018)
https://arxiv.org/pdf/1706.07269.pdf
Seven Myths in Machine Learning Research (Chang 19)
Myth 7: Saliency maps are robust ways to interpret neural networks
https://arxiv.org/pdf/1902.06789.pdf
Sanity Checks for Saliency Maps (Adebayo 2020)
https://arxiv.org/pdf/1810.03292.pdf
Interpretable Machine Learning: A Guide for Making Black Box Models Explainable.
https://christophm.github.io/interpretable-ml-book/
Christoph Molnar:
https://www.linkedin.com/in/christoph-molnar-63777189/
https://machine-master.blogspot.com/
https://twitter.com/ChristophMolnar
Please show your appreciation and buy Christoph's book here;
https://www.lulu.com/shop/christoph-molnar/interpretable-machine-learning/paperback/product-24449081.html?page=1&pageSize=4
Panel:
Connor Tann https://www.linkedin.com/in/connor-tann-a92906a1/
Dr. Tim Scarfe
Dr. Keith Duggar
Video version:
https://youtu.be/0LIACHcxpHU
Academics think of themselves as trailblazers, explorers — seekers of the truth.
Any fundamental discovery involves a significant degree of risk. If an idea is guaranteed to work then it moves from the realm of research to engineering. Unfortunately, this also means that most research careers will invariably be failures at least if failures are measured via “objective” metrics like citations. Today we discuss the recent article from Mark Saroufim called Machine Learning: the great stagnation. We discuss the rise of gentleman scientists, fake rigor, incentives in ML, SOTA-chasing, "graduate student descent", distribution of talent in ML and how to learn effectively.
With special guest interviewer Mat Salvaris.
Machine learning: the great stagnation [00:00:00]
Main show kick off [00:16:30]
Great stagnation article / Bad incentive systems in academia [00:18:24]
OpenAI is a media business [00:19:48]
Incentive structures in academia [00:22:13]
SOTA chasing [00:24:47]
F You Money [00:28:53]
Research grants and gentlemen scientists [00:29:13]
Following your own gradient of interest and making a contribution [00:33:27]
Marketing yourself to be successful [00:37:07]
Tech companies create the bad incentives [00:42:20]
GPT3 was sota chasing but it seemed really... "good"? Scaling laws? [00:51:09]
Dota / game AI [00:58:39]
Hard to go it alone? [01:02:08]
Reaching out to people [01:09:21]
Willingness to be wrong [01:13:14]
Distribution of talent / tech interviews [01:18:30]
What should you read online and how to learn? Sharing your stuff online and finding your niece [01:25:52]
Mark Saroufim:
https://marksaroufim.substack.com/
http://robotoverlordmanual.com/
https://twitter.com/marksaroufim
https://www.youtube.com/marksaroufim
Dr. Mathew Salvaris:
https://www.linkedin.com/in/drmathewsalvaris/
https://twitter.com/MSalvaris
Microsoft has an interesting strategy with their new “autonomous systems” technology also known as Project Bonsai. They want to create an interface to abstract away the complexity and esoterica of deep reinforcement learning. They want to fuse together expert knowledge and artificial intelligence all on one platform, so that complex problems can be decomposed into simpler ones. They want to take machine learning Ph.Ds out of the equation and make autonomous systems engineering look more like a traditional software engineering process. It is an ambitious undertaking, but interesting. Reinforcement learning is extremely difficult (as I cover in the video), and if you don’t have a team of RL Ph.Ds with tech industry experience, you shouldn’t even consider doing it yourself. This is our take on it!
There are 3 chapters in this video;
Chapter 1: Tim's intro and take on RL being hard, intro to Bonsai and machine teaching
Chapter 2: Interview with Scott Stanfield [recorded Jan 2020] 00:56:41
Chapter 3: Traditional street talk episode [recorded Dec 2020] 01:38:13
This is *not* an official communication from Microsoft, all personal opinions. There is no MS-confidential information in this video.
With:
Scott Stanfield
https://twitter.com/seesharp
Megan Bloemsma
https://twitter.com/BloemsmaMegan
Gurdeep Pall (he has not validated anything we have said in this video or been involved in the creation of it)
https://www.linkedin.com/in/gurdeep-pall-0aa639bb/
Panel:
Dr. Keith Duggar
Dr. Tim Scarfe
Yannic Kilcher
Today we are going to talk about the *Data-efficient image Transformers paper or (DeiT) which Hugo is the primary author of. One of the recipes of success for vision models since the DL revolution began has been the availability of large training sets. CNNs have been optimized for almost a decade now, including through extensive architecture search which is prone to overfitting. Motivated by the success of transformers-based models
in Natural Language Processing there has been increasing attention in applying these approaches to vision models. Hugo and his collaborators used a different training strategy and a new distillation token to get a massive increase in sample efficiency with image transformers.
00:00:00 Introduction
00:06:33 Data augmentation is all you need
00:09:53 Now the image patches are the convolutions though?
00:12:16 Where are those inductive biases hiding?
00:15:46 Distillation token
00:21:01 Why different resolutions on training
00:24:14 How data efficient can we get?
00:26:47 Out of domain generalisation
00:28:22 Why are transformers data efficient at all? Learning invariances
00:32:04 Is data augmentation cheating?
00:33:25 Distillation strategies - matching the intermediatae teacher representation as well as output
00:35:49 Do ML models learn the same thing for a problem?
00:39:01 How is it like at Facebook AI?
00:41:17 How long is the PhD programme?
00:42:03 Other interests outside of transformers?
00:43:18 Transformers for Vision and Language
00:47:40 Could we improve transformers models? (Hybrid models)
00:49:03 Biggest challenges in AI?
00:50:52 How far can we go with data driven approach?
Professor Mark Bishop does not think that computers can be conscious or have phenomenological states of consciousness unless we are willing to accept panpsychism which is idea that mentality is fundamental and ubiquitous in the natural world, or put simply, that your goldfish and everything else for that matter has a mind. Panpsychism postulates that distinctions between intelligences are largely arbitrary.
Mark’s work in the ‘philosophy of AI’ led to an influential critique of computational approaches to Artificial Intelligence through a thorough examination of John Searle's 'Chinese Room Argument'
Mark just published a paper called artificial intelligence is stupid and causal reasoning wont fix it. He makes it clear in this paper that in his opinion computers will never be able to compute everything, understand anything, or feel anything.
00:00:00 Tim Intro
00:15:04 Intro
00:18:49 Introduction to Marks ideas
00:25:49 Some problems are not computable
00:29:57 the dancing was Pixies fallacy
00:32:36 The observer relative problem, and its all in the mapping
00:43:03 Conscious Experience
00:53:30 Intelligence without representation, consciousness is something that we do
01:02:36 Consciousness helps us to act autonomously
01:05:13 The Chinese room argument
01:14:58 Simulation argument and computation doesn't have phenomenal consciousness
01:17:44 Language informs our colour perception
01:23:11 We have our own distinct ontologies
01:27:12 Kurt Gödel, Turing and Penrose and the implications of their work
Today we have professor Pedro Domingos and we are going to talk about activism in machine learning, cancel culture, AI ethics and kernels. In Pedro's book the master algorithm, he segmented the AI community into 5 distinct tribes with 5 unique identities (and before you ask, no the irony of an anti-identitarian doing do was not lost on us!). Pedro recently published an article in Quillette called Beating Back Cancel Culture: A Case Study from the Field of Artificial Intelligence. Domingos has railed against political activism in the machine learning community and cancel culture. Recently Pedro was involved in a controversy where he asserted the NeurIPS broader impact statements are an ideological filter mechanism.
Important Disclaimer: All views expressed are personal opinions.
00:00:00 Caveating
00:04:08 Main intro
00:07:44 Cancelling culture is a culture and intellectual weakness
00:12:26 Is cancel culture a post-modern religion?
00:24:46 Should we have gateways and gatekeepers?
00:29:30 Does everything require broader impact statements?
00:33:55 We are stifling diversity (of thought) not promoting it.
00:39:09 What is fair and how to do fair?
00:45:11 Models can introduce biases by compressing away minority data
00:48:36 Accurate but unequal soap dispensers
00:53:55 Agendas are not even self-consistent
00:56:42 Is vs Ought: all variables should be used for Is
01:00:38 Fighting back cancellation with cancellation?
01:10:01 Intent and degree matter in right vs wrong.
01:11:08 Limiting principles matter
01:15:10 Gradient descent and kernels
01:20:16 Training Journey matter more than Destination
01:24:36 Can training paths teach us about symmetry?
01:28:37 What is the most promising path to AGI?
01:31:29 Intelligence will lose its mystery
Dr. Simon Stringer. Obtained his Ph.D in mathematical state space control theory and has been a Senior Research Fellow at Oxford University for over 27 years. Simon is the director of the the Oxford Centre for Theoretical Neuroscience and Artificial Intelligence, which is based within the Oxford University Department of Experimental Psychology. His department covers vision, spatial processing, motor function, language and consciousness -- in particular -- how the primate visual system learns to make sense of complex natural scenes. Dr. Stringers laboratory houses a team of theoreticians, who are developing computer models of a range of different aspects of brain function. Simon's lab is investigating the neural and synaptic dynamics that underpin brain function. An important matter here is the The feature-binding problem which concerns how the visual system represents the hierarchical relationships between features. the visual system must represent hierarchical binding relations across the entire visual field at every spatial scale and level in the hierarchy of visual primitives.
We discuss the emergence of self-organised behaviour, complex information processing, invariant sensory representations and hierarchical feature binding which emerges when you build biologically plausible neural networks with temporal spiking dynamics.
00:00:09 Tim Intro
00:09:31 Show kickoff
00:14:37 Hierarchical Feature binding and timing of action potentials
00:30:16 Hebb to Spike-timing-dependent plasticity (STDP)
00:35:27 Encoding of shape primitives
00:38:50 Is imagination working in the same place in the brain
00:41:12 Compare to supervised CNNs
00:45:59 Speech recognition, motor system, learning mazes
00:49:28 How practical are these spiking NNs
00:50:19 Why simulate the human brain
00:52:46 How much computational power do you gain from differential timings
00:55:08 Adversarial inputs
00:59:41 Generative / causal component needed?
01:01:46 Modalities of processing i.e. language
01:03:42 Understanding
01:04:37 Human hardware
01:06:19 Roadmap of NNs?
01:10:36 Intepretability methods for these new models
01:13:03 Won't GPT just scale and do this anyway?
01:15:51 What about trace learning and transformation learning
01:18:50 Categories of invariance
01:19:47 Biological plausibility
https://www.youtube.com/watch?v=aisgNLypUKs
Adversarial examples have attracted significant attention in machine learning, but the reasons for their existence and pervasiveness remain unclear. there's good reason to believe neural networks look at very different features than we would have expected. As articulated in the 2019 "features not bugs" paper Adversarial examples can be directly attributed to the presence of non-robust features: features derived from patterns in the data distribution that are highly predictive, yet brittle and incomprehensible to humans.
Adversarial examples don't just affect deep learning models. A cottage industry has sprung up around Threat Modeling in AI and ML Systems and their dependencies. Joining us this evening are some of currently leading researchers in adversarial examples;
Florian Tramèr - A fifth year PhD student in Computer Science at Stanford University
https://floriantramer.com/
https://twitter.com/florian_tramer
Dr. Wieland Brendel - Machine Learning Researcher at the University of Tübingen & Co-Founder of layer7.ai
https://medium.com/@wielandbr
https://twitter.com/wielandbr
Dr. Nicholas Carlini - Research scientist at Google Brain working in that exciting space between machine learning and computer security.
https://nicholas.carlini.com/
We really hope you enjoy the conversation, remember to subscribe!
Yannic Intro [00:00:00]
Tim Intro [00:04:07]
Threat Taxonomy [00:09:00]
Main show intro [00:11:30]
Whats wrong with Neural Networks? [00:14:52]
The role of memorization [00:19:51]
Anthropomorphization of models [00:22:42]
Whats the harm really though / focusing on actual ML security risks [00:27:03]
Shortcut learning / OOD generalization [00:36:18]
Human generalization [00:40:11]
An existential problem in DL getting the models to learn what we want? [00:41:39]
Defenses to adversarial examples [00:47:15]
What if we had all the data and the labels? Still problems? [00:54:28]
Defenses are easily broken [01:00:24]
Self deception in academia [01:06:46]
ML Security [01:28:15]
https://www.youtube.com/watch?v=2PenK06tvE4
ena Voita is a Ph.D. student at the University of Edinburgh and University of Amsterdam. Previously, She was a research scientist at Yandex Research and worked closely with the Yandex Translate team. She still teaches NLP at the Yandex School of Data Analysis. She has created an exciting new NLP course on her website lena-voita.github.io which you folks need to check out! She has one of the most well presented blogs we have ever seen, where she discusses her research in an easily digestable manner. Lena has been investigating many fascinating topics in machine learning and NLP. Today we are going to talk about three of her papers and corresponding blog articles;
Source and Target Contributions to NMT Predictions -- Where she talks about the influential dichotomy between the source and the prefix of neural translation models.
https://arxiv.org/pdf/2010.10907.pdf
https://lena-voita.github.io/posts/source_target_contributions_to_nmt.html
Information-Theoretic Probing with MDL -- Where Lena proposes a technique of evaluating a model using the minimum description length or Kolmogorov complexity of labels given representations rather than something basic like accuracy
https://arxiv.org/pdf/2003.12298.pdf
https://lena-voita.github.io/posts/mdl_probes.html
Evolution of Representations in the Transformer - Lena investigates the evolution of representations of individual tokens in Transformers -- trained with different training objectives (MT, LM, MLM)
https://arxiv.org/abs/1909.01380
https://lena-voita.github.io/posts/emnlp19_evolution.html
Panel Dr. Tim Scarfe, Yannic Kilcher, Sayak Paul
00:00:00 Kenneth Stanley / Greatness can not be planned house keeping
00:21:09 Kilcher intro
00:28:54 Hello Lena
00:29:21 Tim - Lenas NMT paper
00:35:26 Tim - Minimum Description Length / Probe paper
00:40:12 Tim - Evolution of representations
00:46:40 Lenas NLP course
00:49:18 The peppermint tea situation
00:49:28 Main Show Kick Off
00:50:22 Hallucination vs exposure bias
00:53:04 Lenas focus on explaining the models not SOTA chasing
00:56:34 Probes paper and NLP intepretability
01:02:18 Why standard probing doesnt work
01:12:12 Evolutions of representations paper
01:23:53 BERTScore and BERT Rediscovers the Classical NLP Pipeline paper
01:25:10 Is the shifting encoding context because of BERT bidirectionality
01:26:43 Objective defines which information we lose on input
01:27:59 How influential is the dataset?
01:29:42 Where is the community going wrong?
01:31:55 Thoughts on GOFAI/Understanding in NLP?
01:36:38 Lena's NLP course
01:47:40 How to foster better learning / understanding
01:52:17 Lena's toolset and languages
01:54:12 Mathematics is all you need
01:56:03 Programming languages
https://lena-voita.github.io/
https://www.linkedin.com/in/elena-voita/
https://scholar.google.com/citations?user=EcN9o7kAAAAJ&hl=ja
https://twitter.com/lena_voita
Professor Kenneth Stanley is currently a research science manager at OpenAI in San Fransisco. We've Been dreaming about getting Kenneth on the show since the very begininning of Machine Learning Street Talk. Some of you might recall that our first ever show was on the enhanced POET paper, of course Kenneth had his hands all over it. He's been cited over 16000 times, his most popular paper with over 3K citations was the NEAT algorithm. His interests are neuroevolution, open-endedness, NNs, artificial life, and AI. He invented the concept of novelty search with no clearly defined objective. His key idea is that there is a tyranny of objectives prevailing in every aspect of our lives, society and indeed our algorithms. Crucially, these objectives produce convergent behaviour and thinking and distract us from discovering stepping stones which will lead to greatness. He thinks that this monotonic objective obsession, this idea that we need to continue to improve benchmarks every year is dangerous. He wrote about this in detail in his recent book "greatness can not be planned" which will be the main topic of discussion in the show. We also cover his ideas on open endedness in machine learning.
00:00:00 Intro to Kenneth
00:01:16 Show structure disclaimer
00:04:16 Passionate discussion
00:06:26 WHy greatness cant be planned and the tyranny of objectives
00:14:40 Chinese Finger Trap
00:16:28 Perverse Incentives and feedback loops
00:18:17 Deception
00:23:29 Maze example
00:24:44 How can we define curiosity or interestingness
00:26:59 Open endedness
00:33:01 ICML 2019 and Yannic, POET, first MSLST
00:36:17 evolutionary algorithms++
00:43:18 POET, the first MLST
00:45:39 A lesson to GOFAI people
00:48:46 Machine Learning -- the great stagnation
00:54:34 Actual scientific successes are usually luck, and against the odds -- Biontech
00:56:21 Picbreeder and NEAT
01:10:47 How Tim applies these ideas to his life and why he runs MLST
01:14:58 Keith Skit about UCF
01:15:13 Main show kick off
01:18:02 Why does Kenneth value serindipitous exploration so much
01:24:10 Scientific support for Keneths ideas in normal life
01:27:12 We should drop objectives to achieve them. An oxymoron?
01:33:13 Isnt this just resource allocation between exploration and exploitation?
01:39:06 Are objectives merely a matter of degree?
01:42:38 How do we allocate funds for treasure hunting in society
01:47:34 A keen nose for what is interesting, and voting can be dangerous
01:53:00 Committees are the antithesis of innovation
01:56:21 Does Kenneth apply these ideas to his real life?
01:59:48 Divergence vs interestingness vs novelty vs complexity
02:08:13 Picbreeder
02:12:39 Isnt everything novel in some sense?
02:16:35 Imagine if there was no selection pressure?
02:18:31 Is innovation == environment exploitation?
02:20:37 Is it possible to take shortcuts if you already knew what the innovations were?
02:21:11 Go Explore -- does the algorithm encode the stepping stones?
02:24:41 What does it mean for things to be interestingly different?
02:26:11 behavioral characterization / diversity measure to your broad interests
02:30:54 Shaping objectives
02:32:49 Why do all ambitious objectives have deception? Picbreeder analogy
02:35:59 Exploration vs Exploitation, Science vs Engineering
02:43:18 Schools of thought in ML and could search lead to AGI
02:45:49 Official ending
Connor Tan is a physicist and senior data scientist working for a multinational energy company where he co-founded and leads a data science team. He holds a first-class degree in experimental and theoretical physics from Cambridge university. With a master's in particle astrophysics. He specializes in the application of machine learning models and Bayesian methods. Today we explore the history, pratical utility, and unique capabilities of Bayesian methods. We also discuss the computational difficulties inherent in Bayesian methods along with modern methods for approximate solutions such as Markov Chain Monte Carlo. Finally, we discuss how Bayesian optimization in the context of automl may one day put Data Scientists like Connor out of work.
Panel: Dr. Keith Duggar, Alex Stenlake, Dr. Tim Scarfe
00:00:00 Duggars philisophical ramblings on Bayesianism
00:05:10 Introduction
00:07:30 small datasets and prior scientific knowledge
00:10:37 Bayesian methods are probability theory
00:14:00 Bayesian methods demand hard computations
00:15:46 uncertainty can matter more than estimators
00:19:29 updating or combining knowledge is a key feature
00:25:39 Frequency or Reasonable Expectation as the Primary Concept
00:30:02 Gambling and coin flips
00:37:32 Rev. Thomas Bayes's pool table
00:40:37 ignorance priors are beautiful yet hard
00:43:49 connections between common distributions
00:49:13 A curious Universe, Benford's Law
00:55:17 choosing priors, a tale of two factories
01:02:19 integration, the computational Achilles heel
01:35:25 Bayesian social context in the ML community
01:10:24 frequentist methods as a first approximation
01:13:13 driven to Bayesian methods by small sample size
01:18:46 Bayesian optimization with automl, a job killer?
01:25:28 different approaches to hyper-parameter optimization
01:30:18 advice for aspiring Bayesians
01:33:59 who would connor interview next?
Connor Tann: https://www.linkedin.com/in/connor-tann-a92906a1/
https://twitter.com/connossor
Today we had a fantastic conversation with Professor Max Welling, VP of Technology, Qualcomm Technologies Netherlands B.V.
Max is a strong believer in the power of data and computation and its relevance to artificial intelligence. There is a fundamental blank slate paradgm in machine learning, experience and data alone currently rule the roost. Max wants to build a house of domain knowledge on top of that blank slate. Max thinks there are no predictions without assumptions, no generalization without inductive bias. The bias-variance tradeoff tells us that we need to use additional human knowledge when data is insufficient.
Max Welling has pioneered many of the most sophistocated inductive priors in DL models developed in recent years, allowing us to use Deep Learning with non-euclidean data i.e. on graphs/topology (a field we now called "geometric deep learning") or allowing network architectures to recognise new symmetries in the data for example gauge or SE(3) equivariance. Max has also brought many other concepts from his physics playbook into ML, for example quantum and even Bayesian approaches.
This is not an episode to miss, it might be our best yet!
Panel: Dr. Tim Scarfe, Yannic Kilcher, Alex Stenlake
00:00:00 Show introduction
00:04:37 Protein Fold from DeepMind -- did it use SE(3) transformer?
00:09:58 How has machine learning progressed
00:19:57 Quantum Deformed Neural Networks paper
00:22:54 Probabilistic Numeric Convolutional Neural Networks paper
00:27:04 Ilia Karmanov from Qualcomm interview mini segment
00:32:04 Main Show Intro
00:35:21 How is Max known in the community?
00:36:35 How Max nurtures talent, freedom and relationship is key
00:40:30 Selecting research directions and guidance
00:43:42 Priors vs experience (bias/variance trade-off)
00:48:47 Generative models and GPT-3
00:51:57 Bias/variance trade off -- when do priors hurt us
00:54:48 Capsule networks
01:03:09 Which old ideas whould we revive
01:04:36 Hardware lottery paper
01:07:50 Greatness can't be planned (Kenneth Stanley reference)
01:09:10 A new sort of peer review and originality
01:11:57 Quantum Computing
01:14:25 Quantum deformed neural networks paper
01:21:57 Probabalistic numeric convolutional neural networks
01:26:35 Matrix exponential
01:28:44 Other ideas from physics i.e. chaos, holography, renormalisation
01:34:25 Reddit
01:37:19 Open review system in ML
01:41:43 Outro
Welcome to the Christmas special community edition of MLST! We discuss some recent and interesting papers from Pedro Domingos (are NNs kernel machines?), Deepmind (can NNs out-reason symbolic machines?), Anna Rodgers - When BERT Plays The Lottery, All Tickets Are Winning, Prof. Mark Bishop (even causal methods won't deliver understanding), We also cover our favourite bits from the recent Montreal AI event run by Prof. Gary Marcus (including Rich Sutton, Danny Kahneman and Christof Koch). We respond to a reader mail on Capsule networks. Then we do a deep dive into Type Theory and Lambda Calculus with community member Alex Mattick. In the final hour we discuss inductive priors and label information density with another one of our discord community members.
Panel: Dr. Tim Scarfe, Yannic Kilcher, Alex Stenlake, Dr. Keith Duggar
Enjoy the show and don't forget to subscribe!
00:00:00 Welcome to Christmas Special!
00:00:44 SoTa meme
00:01:30 Happy Christmas!
00:03:11 Paper -- DeepMind - Outperforming neuro-symbolic models with NNs (Ding et al)
00:08:57 What does it mean to understand?
00:17:37 Paper - Prof. Mark Bishop Artificial Intelligence is stupid and causal reasoning
wont fix it
00:25:39 Paper -- Pedro Domingos - Every Model Learned by Gradient Descent Is Approximately a Kernel Machine
00:31:07 Paper - Bengio - Inductive Biases for Deep Learning of Higher-Level Cognition
00:32:54 Anna Rodgers - When BERT Plays The Lottery, All Tickets Are Winning
00:37:16 Montreal AI event - Gary Marcus on reasoning
00:40:37 Montreal AI event -- Rich Sutton on universal theory of AI
00:49:45 Montreal AI event -- Danny Kahneman, System 1 vs 2 and Generative Models ala free energy principle
01:02:57 Montreal AI event -- Christof Koch - Neuroscience is hard
01:10:55 Markus Carr -- reader letter on capsule networks
01:13:21 Alex response to Marcus Carr
01:22:06 Type theory segment -- with Alex Mattick from Discord
01:24:45 Type theory segment -- What is Type Theory
01:28:12 Type theory segment -- Difference between functional and OOP languages
01:29:03 Type theory segment -- Lambda calculus
01:30:46 Type theory segment -- Closures
01:35:05 Type theory segment -- Term rewriting (confluency and termination)
01:42:02 MType theory segment -- eta term rewritig system - Lambda Calculus
01:54:44 Type theory segment -- Types / semantics
02:06:26 Type theory segment -- Calculus of constructions
02:09:27 Type theory segment -- Homotopy type theory
02:11:02 Type theory segment -- Deep learning link
02:17:27 Jan from Discord segment -- Chrome MRU skit
02:18:56 Jan from Discord segment -- Inductive priors (with XMaster96/Jan from Discord)
02:37:59 Jan from Discord segment -- Label information density (with XMaster96/Jan from Discord)
02:55:13 Outro
Dr. Eray Ozkural is an AGI researcher from Turkey, he is the founder of Celestial Intellect Cybernetics. Eray is extremely critical of Max Tegmark, Nick Bostrom and MIRI founder Elizier Yodokovsky and their views on AI safety. Eray thinks that these views represent a form of neoludditism and they are capturing valuable research budgets with doomsday fear-mongering and effectively want to prevent AI from being developed by those they don't agree with. Eray is also sceptical of the intelligence explosion hypothesis and the argument from simulation.
Panel -- Dr. Keith Duggar, Dr. Tim Scarfe, Yannic Kilcher
00:00:00 Show teaser intro with added nuggets and commentary
00:48:39 Main Show Introduction
00:53:14 Doomsaying to Control
00:56:39 Fear the Basilisk!
01:08:00 Intelligence Explosion Ethics
01:09:45 Fear the Automous Drone! ... or spam
01:11:25 Infinity Point Hypothesis
01:15:26 Meat Level Intelligence
01:21:25 Defining Intelligence ... Yet Again
01:27:34 We'll make brains and then shoot them
01:31:00 The Universe likes deep learning
01:33:16 NNs are glorified hash tables
01:38:44 Radical behaviorists
01:41:29 Omega Architecture, possible AGI?
01:53:33 Simulation hypothesis
02:09:44 No one cometh unto Simulation, but by Jesus Christ
02:16:47 Agendas, Motivations, and Mind Projections
02:23:38 A computable Universe of Bulk Automata
02:30:31 Self-Organized Post-Show Coda
02:31:29 Investigating Intelligent Agency is Science
02:36:56 Goodbye and cheers!
https://www.youtube.com/watch?v=pZsHZDA9TJU
This week Dr. Tim Scarfe, Dr. Keith Duggar and Connor Leahy chat with Prof. Karl Friston. Professor Friston is a British neuroscientist at University College London and an authority on brain imaging. In 2016 he was ranked the most influential neuroscientist on Semantic Scholar. His main contribution to theoretical neurobiology is the variational Free energy principle, also known as active inference in the Bayesian brain. The FEP is a formal statement that the existential imperative for any system which survives in the changing world can be cast as an inference problem. Bayesian Brain Hypothesis states that the brain is confronted with ambiguous sensory evidence, which it interprets by making inferences about the hidden states which caused the sensory data. So is the brain an inference engine? The key concept separating Friston's idea from traditional stochastic reinforcement learning methods and even Bayesian reinforcement learning is moving away from goal-directed optimisation.
Remember to subscribe! Enjoy the show!
00:00:00 Show teaser intro
00:16:24 Main formalism for FEP
00:28:29 Path Integral
00:30:52 How did we feel talking to friston?
00:34:06 Skit - on cultures (checked, but maybe make shorter)
00:36:02 Friston joins
00:36:33 Main show introduction
00:40:51 Is prediction all it takes for intelligence?
00:48:21 balancing accuracy with flexibility
00:57:36 belief-free vs belief-based; beliefs are crucial
01:04:53 Fuzzy Markov Blankets and Wandering Sets
01:12:37 The Free Energy Principle conforms to itself
01:14:50 useful false beliefs
01:19:14 complexity minimization is the heart of free energy [01:19:14 ]Keith:
01:23:25 An Alpha to tip the scales? Absoute not! Absolutely yes!
01:28:47 FEP applied to brain anatomy
01:36:28 Are there multiple non-FEP forms in the brain?
01:43:11 a positive conneciton to backpropagation
01:47:12 The FEP does not explain the origin of FEP systems
01:49:32 Post-show banter
https://www.fil.ion.ucl.ac.uk/~karl/
#machinelearning
This week Dr. Tim Scarfe, Sayak Paul and Yannic Kilcher speak with Dr. Simon Kornblith from Google Brain (Ph.D from MIT). Simon is trying to understand how neural nets do what they do. Simon was the second author on the seminal Google AI SimCLR paper. We also cover "Do Wide and Deep Networks learn the same things?", "Whats in a Loss function for Image Classification?", and "Big Self-supervised models are strong semi-supervised learners". Simon used to be a neuroscientist and also gives us the story of his unique journey into ML.
00:00:00 Show Teaser / or "short version"
00:18:34 Show intro
00:22:11 Relationship between neuroscience and machine learning
00:29:28 Similarity analysis and evolution of representations in Neural Networks
00:39:55 Expressability of NNs
00:42:33 Whats in a loss function for image classification
00:46:52 Loss function implications for transfer learning
00:50:44 SimCLR paper
01:00:19 Contrast SimCLR to BYOL
01:01:43 Data augmentation
01:06:35 Universality of image representations
01:09:25 Universality of augmentations
01:23:04 GPT-3
01:25:09 GANs for data augmentation??
01:26:50 Julia language
@skornblith
https://www.linkedin.com/in/simon-kornblith-54b2033a/
https://arxiv.org/abs/2010.15327
Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth
https://arxiv.org/abs/2010.16402
What's in a Loss Function for Image Classification?
https://arxiv.org/abs/2002.05709
A Simple Framework for Contrastive Learning of Visual Representations
https://arxiv.org/abs/2006.10029
Big Self-Supervised Models are Strong Semi-Supervised Learners
In this special edition, Dr. Tim Scarfe, Yannic Kilcher and Keith Duggar speak with Gary Marcus and Connor Leahy about GPT-3. We have all had a significant amount of time to experiment with GPT-3 and show you demos of it in use and the considerations.
Note that this podcast version is significantly truncated, watch the youtube version for the TOC and experiments with GPT-3 https://www.youtube.com/watch?v=iccd86vOz3w
This week Dr. Tim Scarfe, Dr. Keith Duggar and Yannic Kilcher discuss multi-arm bandits and pure exploration with Dr. Wouter M. Koolen, Senior Researcher, Machine Learning group, Centrum Wiskunde & Informatica.
Wouter specialises in machine learning theory, game theory, information theory, statistics and optimisation. Wouter is currently interested in pure exploration in multi-armed bandit models, game tree search, and accelerated learning in sequential decision problems. His research has been cited 1000 times, and he has been published in NeurIPS, the number 1 ML conference 14 times as well as lots of other exciting publications.
Today we are going to talk about two of the most studied settings in control, decision theory, and learning in unknown environment which are the multi-armed bandit (MAB) and reinforcement learning (RL) approaches
- when can an agent stop learning and start exploiting using the knowledge it obtained
- which strategy leads to minimal learning time
00:00:00 What are multi-arm bandits/show trailer
00:12:55 Show introduction
00:15:50 Bandits
00:18:58 Taxonomy of decision framework approaches
00:25:46 Exploration vs Exploitation
00:31:43 the sharp divide between modes
00:34:12 bandit measures of success
00:36:44 connections to reinforcement learning
00:44:00 when to apply pure exploration in games
00:45:54 bandit lower bounds, a pure exploration renaissance
00:50:21 pure exploration compiler dreams
00:51:56 what would the PX-compiler DSL look like
00:57:13 the long arms of the bandit
01:00:21 causal models behind the curtain of arms
01:02:43 adversarial bandits, arms trying to beat you
01:05:12 bandits as an optimization problem
01:11:39 asymptotic optimality vs practical performance
01:15:38 pitfalls hiding under asymptotic cover
01:18:50 adding features to bandits
01:27:24 moderate confidence regimes
01:30:33 algorithms choice is highly sensitive to bounds
01:46:09 Post script: Keith interesting piece on n quantum
http://wouterkoolen.info
https://www.cwi.nl/research-groups/ma...
#machinelearning
This week Dr. Tim Scarfe, Dr. Keith Duggar, Yannic Kilcher and Connor Leahy cover a broad range of topics, ranging from academia, GPT-3 and whether prompt engineering could be the next in-demand skill, markets and economics including trading and whether you can predict the stock market, AI alignment, utilitarian philosophy, randomness and intelligence and even whether the universe is infinite!
00:00:00 Show Introduction
00:12:49 Academia and doing a Ph.D
00:15:49 From academia to wall street
00:17:08 Quants -- smoke and mirrors? Tail Risk
00:19:46 Previous results dont indicate future success in markets
00:23:23 Making money from social media signals?
00:24:41 Predicting the stock market
00:27:20 Things which are and are not predictable
00:31:40 Tim postscript comment on predicting markets
00:32:37 Connor take on markets
00:35:16 As market become more efficient..
00:36:38 Snake oil in ML
00:39:20 GPT-3, we have changed our minds
00:52:34 Prompt engineering a new form of software development?
01:06:07 GPT-3 and prompt engineering
01:12:33 Emergent intelligence with increasingly weird abstractions
01:27:29 Wireheading and the economy
01:28:54 Free markets, dragon story and price vs value
01:33:59 Utilitarian philosophy and what does good look like?
01:41:39 Randomness and intelligence
01:44:55 Different schools of thought in ML
01:46:09 Is the universe infinite?
Thanks a lot for Connor Leahy for being a guest on today's show. https://twitter.com/NPCollapse -- you can join his EleutherAI community discord here: https://discord.com/invite/vtRgjbM
#machinelearning
This week Dr. Tim Scarfe, Dr. Keith Duggar and Yannic Kilcher speak with veteran NLU expert Dr. Walid Saba.
Walid is an old-school AI expert. He is a polymath, a neuroscientist, psychologist, linguist, philosopher, statistician, and logician. He thinks the missing information problem and lack of a typed ontology is the key issue with NLU, not sample efficiency or generalisation. He is a big critic of the deep learning movement and BERTology. We also cover GPT-3 in some detail in today's session, covering Luciano Floridi's recent article "GPT‑3: Its Nature, Scope, Limits, and Consequences" and a commentary on the incredible power of GPT-3 to perform tasks with just a few examples including the Yann LeCun commentary on Facebook and Hackernews.
Time stamps on the YouTube version
0:00:00 Walid intro
00:05:03 Knowledge acquisition bottleneck
00:06:11 Language is ambiguous
00:07:41 Language is not learned
00:08:32 Language is a formal language
00:08:55 Learning from data doesn’t work
00:14:01 Intelligence
00:15:07 Lack of domain knowledge these days
00:16:37 Yannic Kilcher thuglife comment
00:17:57 Deep learning assault
00:20:07 The way we evaluate language models is flawed
00:20:47 Humans do type checking
00:23:02 Ontologic
00:25:48 Comments On GPT3
00:30:54 Yann lecun and reddit
00:33:57 Minds and machines - Luciano
00:35:55 Main show introduction
00:39:02 Walid introduces himself
00:40:20 science advances one funeral at a time
00:44:58 Deep learning obsession syndrome and inception
00:46:14 BERTology / empirical methods are not NLU
00:49:55 Pattern recognition vs domain reasoning, is the knowledge in the data
00:56:04 Natural language understanding is about decoding and not compression, it's not learnable.
01:01:46 Intelligence is about not needing infinite amounts of time
01:04:23 We need an explicit ontological structure to understand anything
01:06:40 Ontological concepts
01:09:38 Word embeddings
01:12:20 There is power in structure
01:15:16 Language models are not trained on pronoun disambiguation and resolving scopes
01:17:33 The information is not in the data
01:19:03 Can we generate these rules on the fly? Rules or data?
01:20:39 The missing data problem is key
01:21:19 Problem with empirical methods and lecunn reference
01:22:45 Comparison with meatspace (brains)
01:28:16 The knowledge graph game, is knowledge constructed or discovered
01:29:41 How small can this ontology of the world be?
01:33:08 Walids taxonomy of understanding
01:38:49 The trend seems to be, less rules is better not the othe way around?
01:40:30 Testing the latest NLP models with entailment
01:42:25 Problems with the way we evaluate NLP
01:44:10 Winograd Schema challenge
01:45:56 All you need to know now is how to build neural networks, lack of rigour in ML research
01:50:47 Is everything learnable
01:53:02 How should we elevate language systems?
01:54:04 10 big problems in language (missing information)
01:55:59 Multiple inheritance is wrong
01:58:19 Language is ambiguous
02:01:14 How big would our world ontology need to be?
02:05:49 How to learn more about NLU
02:09:10 AlphaGo
Walid's blog: https://medium.com/@ontologik
LinkedIn: https://www.linkedin.com/in/walidsaba/
This week Dr. Tim Scarfe, Alex Stenlake and Yannic Kilcher speak with AGI and AI alignment specialist Connor Leahy a machine learning engineer from Aleph Alpha and founder of EleutherAI.
Connor believes that AI alignment is philosophy with a deadline and that we are on the precipice, the stakes are astronomical. AI is important, and it will go wrong by default. Connor thinks that the singularity or intelligence explosion is near. Connor says that AGI is like climate change but worse, even harder problems, even shorter deadline and even worse consequences for the future. These problems are hard, and nobody knows what to do about them.
00:00:00 Introduction to AI alignment and AGI fire alarm
00:15:16 Main Show Intro
00:18:38 Different schools of thought on AI safety
00:24:03 What is intelligence?
00:25:48 AI Alignment
00:27:39 Humans dont have a coherent utility function
00:28:13 Newcomb's paradox and advanced decision problems
00:34:01 Incentives and behavioural economics
00:37:19 Prisoner's dilemma
00:40:24 Ayn Rand and game theory in politics and business
00:44:04 Instrumental convergence and orthogonality thesis
00:46:14 Utility functions and the Stop button problem
00:55:24 AI corrigibality - self alignment
00:56:16 Decision theory and stability / wireheading / robust delegation
00:59:30 Stop button problem
01:00:40 Making the world a better place
01:03:43 Is intelligence a search problem?
01:04:39 Mesa optimisation / humans are misaligned AI
01:06:04 Inner vs outer alignment / faulty reward functions
01:07:31 Large corporations are intelligent and have no stop function
01:10:21 Dutch booking / what is rationality / decision theory
01:16:32 Understanding very powerful AIs
01:18:03 Kolmogorov complexity
01:19:52 GPT-3 - is it intelligent, are humans even intelligent?
01:28:40 Scaling hypothesis
01:29:30 Connor thought DL was dead in 2017
01:37:54 Why is GPT-3 as intelligent as a human
01:44:43 Jeff Hawkins on intelligence as compression and the great lookup table
01:50:28 AI ethics related to AI alignment?
01:53:26 Interpretability
01:56:27 Regulation
01:57:54 Intelligence explosion
Discord: https://discord.com/invite/vtRgjbM
EleutherAI: https://www.eleuther.ai
Twitter: https://twitter.com/npcollapse
LinkedIn: https://www.linkedin.com/in/connor-j-leahy/
Join Dr Tim Scarfe, Sayak Paul, Yannic Kilcher, and Alex Stenlake have a conversation with Mr. Chai Time Data Science; Sanyam Bhutani!
00:00:00 Introduction
00:03:42 Show kick off
00:06:34 How did Sanyam get started into ML
00:07:46 Being a content creator
00:09:01 Can you be self taught without a formal education in ML?
00:22:54 Kaggle
00:33:41 H20 product / job
00:40:58 Intepretability / bias / engineering skills
00:43:22 Get that first job in DS
00:46:29 AWS ML Ops architecture / ml engineering
01:14:19 Patterns
01:18:09 Testability
01:20:54 Adversarial examples
Sanyam's blog -- https://sanyambhutani.com/tag/chaitimedatascience/
Chai Time Data Science -- https://www.youtube.com/c/ChaiTimeDataScience
Dr. Tim Scarfe, Yannic Kilcher and Sayak Paul chat with Sara Hooker from the Google Brain team! We discuss her recent hardware lottery paper, pruning / sparsity, bias mitigation and intepretability.
The hardware lottery -- what causes inertia or friction in the marketplace of ideas? Is there a meritocracy of ideas or do the previous decisions we have made enslave us? Sara Hooker calls this a lottery because she feels that machine learning progress is entirely beholdant to the hardware and software landscape. Ideas succeed if they are compatible with the hardware and software at the time and also the existing inventions. The machine learning community is exceptional because the pace of innovation is fast and we operate largely in the open, this is largely because we don't build anything physical which is expensive, slow and the cost of being scooped is high. We get stuck in basins of attraction based on our technology decisions and it's expensive to jump outside of these basins. So is this story unique to hardware and AI algorithms or is it really just the story of all innovation? Every great innovation must wait for the right stepping stone to be in place before it can really happen. We are excited to bring you Sara Hooker to give her take.
YouTube version (including TOC): https://youtu.be/sQFxbQ7ade0
Show notes; https://drive.google.com/file/d/1S_rHnhaoVX4Nzx_8e3ESQq4uSswASNo7/view?usp=sharing
Sara Hooker page; https://www.sarahooker.me
This week join Dr. Tim Scarfe, Yannic Kilcher, and Keith Duggar have a conversation with Dr. Rebecca Roache in the last of our 3-part series on the social dilemma Netflix film. Rebecca is a senior lecturer in philosophy at Royal Holloway, university of London and has written extensively about the future of friendship.
People claim that friendships are not what they used to be. People are always staring at their phones, even when in public Social media has turned us into narcissists who are always managing our own PR rather than being present with each other. Anxiety about the negative effects of technology are as old as the written word. Is technology bad for friendships? Can you have friends through screens? Does social media cause polarization? And is that a bad thing? Does it promote quantity over quality? Rebecca thinks that social media and echo chambers are less ominous to friendship on closer inspection.
00:00:32 Teaser clip from Rebecca and her new manuscript on friendship
00:02:52 Introduction
00:04:56 Memorisation vs reasoning / is technology enhancing friendships
00:09:29 Word of warcraft / gaming communities / echo chambers / polarisation
00:12:34 Horizontal vs Vertical social attributes
00:17:18 Exclusion of others opinions
00:20:36 The power to silence others / truth verification
00:23:58 Misinformation
00:27:28 Norms / memes / political terms and co-opting / bullying
00:31:57 Redefinition of political terms i.e. racism
00:36:13 Virtue signalling
00:38:57 How many friends can you have / spread thin / Dunbars 150
00:42:54 Is it morally objectionable to believe or contemplate objectionable ideas, punishment
00:50:52 Is speaking the same thing as acting
00:52:24 Punishment - deterrence vs retribution / historical
00:53:59 Yannic: contemplating is a form of speaking
00:57:32 silencing/blocking is intellectual laziness - what ideas are we allowed to talk about
01:04:53 Corporate AI ethics frameworks
01:09:14 Autonomous Vehicles
01:10:51 the eternal Facebook world / online vs offline friendships
01:14:05 How do we get the best out of our online friendships
This week on Machine Learning Street Talk, Dr. Tim Scarfe, Dr. Keith Duggar, Alex Stenlake and Yannic Kilcher have a conversation with the founder and principal researcher at the Montreal AI Ethics institute -- Abhishek Gupta. We cover several topics from the Social Dilemma film and AI Ethics in general.
00:00:00 Introduction
00:03:57 Overcome our weaknesses
00:14:30 threat landscape blind spots
00:18:35 differential reality vs universal shaping
00:24:21 shared reality incentives and tools
00:32:01 transparency and knowledge to avoid pathology
00:40:09 federated informed autonomy
00:49:48 diversity is a metric, inclusion is a strategy
00:59:58 locally aligned pockets can stabilize global diversity
01:10:58 making inclusion easier with tools
01:23:35 enabling community feedback
01:26:16 open source the algorithms
01:33:02 the N+1 cost of inclusion
01:38:08 broader impact statement
https://atg-abhishek.github.io
https://www.linkedin.com/in/abhishekguptamcgill/
In this first part of our three part series on the Social Dilemma Netflix film, Dr. Tim Scarfe, Yannic "Lightspeed" Kilcher and Zak Jost gang up with Cybersecurity expert Andy Smith. We give you our take on the film. We are super excited to get your feedback on this one! Hope you enjoy.
00:00:00 Introduction
00:06:11 Moral hypocrisy
00:12:38 Road to hell is paved with good intentions, attention economy
00:15:04 They know everything about you
00:18:02 Addiction
00:21:22 Differential realities
00:26:12 Self determination and Monetisation
00:29:08 AI: Overwhelm human strengths undermine human vulnerabilities
00:31:51 Conspiracy theory / fake news
00:34:23 Overton window / polarisation
00:39:12 Short attention span / convergent behaviour
00:41:26 Is social media good for you
00:45:17 Your attention time is linear, the things you can pay attention to are a volume, anonymity
00:51:32 Andy question on security: social engineering
00:56:32 Is it a security risk having your information in social media
00:58:02 Retrospective judgement
01:03:06 Free speech and censorship
01:06:06 Technology accelerator
In today's episode, Dr. Keith Duggar, Alex Stenlake and Dr. Tim Scarfe chat about the education chapter in Kenneth Stanley's "Greatness cannot be planned" book, and we relate it to our Algoshambes conversation a few weeks ago. We debate whether objectives in education are a good thing and whether they cause perverse incentives and stifle creativity and innovation. Next up we dissect capsule networks from the top down! We finish off talking about fast algorithms and quantum computing.
00:00:00 Introduction
00:01:13 Greatness cannot be planned / education
00:12:03 Perverse incentives
00:19:25 Treasure hunting
00:30:28 Capsule Networks
00:46:08 Capsules As Compositional Networks
00:52:45 Capsule Routing
00:57:10 Loss and Warps
01:09:55 Fast Algorithms and Quantum Computing
This week Dr. Tim Scarfe, Dr. Keith Duggar, Yannic "Lightspeed" Kilcher have a conversation with Microsoft Senior Software Engineer Sachin Kundu. We speak about programming languages including which our favourites are and functional programming vs OOP. Next we speak about software engineering and the intersection of software engineering and machine learning. We also talk about applications of ML and finally what makes an exceptional software engineer and tech lead. Sachin is an expert in this field so we hope you enjoy the conversation!
Spoiler alert, how many of you have read the Mythical Man-Month by Frederick P. Brooks?!
00:00:00 Introduction
00:06:37 Programming Languages
00:53:41 Applications of ML
01:55:59 What makes an exceptional SE and tech lead
01:22:08 Outro
This week Dr. Keith Duggar, Alex Stenlake and Dr. Tim Scarfe discuss the theory of computation, intelligence, Bayesian model selection, the intelligence explosion and the the phenomenon of "interactive articles".
00:00:00 Intro
00:01:27 Kernels and context-free grammars
00:06:04 Theory of computation
00:18:41 Intelligence
00:22:03 Bayesian model selection
00:44:05 AI-IQ Measure / Intelligence explosion
00:52:09 Interactive articles
01:12:32 Outro
Today Yannic Lightspeed Kilcher and I spoke with Alex Stenlake about Kernel Methods. What is a kernel? Do you remember those weird kernel things which everyone obsessed about before deep learning? What about Representer theorem and reproducible kernel hilbert spaces? SVMs and kernel ridge regression? Remember them?! Hope you enjoy the conversation!
00:00:00 Tim Intro
00:01:35 Yannic clever insight from this discussion
00:03:25 Street talk and Alex intro
00:05:06 How kernels are taught
00:09:20 Computational tractability
00:10:32 Maths
00:11:50 What is a kernel?
00:19:39 Kernel latent expansion
00:23:57 Overfitting
00:24:50 Hilbert spaces
00:30:20 Compare to DL
00:31:18 Back to hilbert spaces
00:45:19 Computational tractability 2
00:52:23 Curse of dimensionality
00:55:01 RBF: infinite taylor series
00:57:20 Margin/SVM
01:00:07 KRR/dual
01:03:26 Complexity compute kernels vs deep learning
01:05:03 Good for small problems? vs deep learning)
01:07:50 Whats special about the RBF kernel
01:11:06 Another DL comparison
01:14:01 Representer theorem
01:20:05 Relation to back prop
01:25:10 Connection with NLP/transformers
01:27:31 Where else kernels good
01:34:34 Deep learning vs dual kernel methods
01:33:29 Thoughts on AI
01:34:35 Outro
This week Dr. Tim Scarfe and Dr. Keith Duggar discuss Explainability, Reasoning, Priors and GPT-3. We check out Christoph Molnar's book on intepretability, talk about priors vs experience in NNs, whether NNs are reasoning and also cover articles by Gary Marcus and Walid Saba critiquing deep learning. We finish with a brief discussion of Chollet's ARC challenge and intelligence paper.
00:00:00 Intro
00:01:17 Explainability and Christoph Molnars book on Intepretability
00:26:45 Explainability - Feature visualisation
00:33:28 Architecture / CPPNs
00:36:10 Invariance and data parsimony, priors and experience, manifolds
00:42:04 What NNs learn / logical view of modern AI (Walid Saba article)
00:47:10 Core knowledge
00:55:33 Priors vs experience
00:59:44 Mathematical reasoning
01:01:56 Gary Marcus on GPT-3
01:09:14 Can NNs reason at all?
01:18:05 Chollet intelligence paper/ARC challenge
This week Dr. Tim Scarfe, Yannic Lightspeed Kicher, Sayak Paul and Ayush Takur interview Mathilde Caron from Facebook Research (FAIR).
We discuss Mathilde's paper which she wrote with her collaborators "SWaV: Unsupervised Learning of Visual Features by Contrasting Cluster Assignments" @ https://arxiv.org/pdf/2006.09882.pdf
This paper is the latest unsupervised contrastive visual representations algorithm and has a new data augmentation strategy and also a new online clustering strategy.
Note; Other authors; Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin
Sayak Paul - @RisingSayak / https://www.linkedin.com/in/sayak-paul/
Ayush Thakur - @ayushthakur0
/ https://www.linkedin.com/in/ayush-thakur-731914149/
The article they wrote;
https://app.wandb.ai/authors/swav-tf/reports/Unsupervised-Visual-Representation-Learning-with-SwAV--VmlldzoyMjg3Mzg
00:00:00 Yannic probability challenge (CAN YOU SOLVE IT?)
00:01:29 Intro topic (Tim)
00:08:18 Yannic take
00:09:33 Intro show and guests
00:11:29 SWaV elevator pitch
00:17:31 Clustering approach in general
00:21:17 Sayak and Ayush's article on SWaV
00:23:49 Optional transport problem / Sinkhorn-Knopp algorithm
00:31:43 Is clustering a natural approach for this?
00:44:19 Image augmentations
00:46:20 Priors vs experience (data)
00:48:32 Life at FAIR
00:52:33 Progress of image augmentation
00:56:10 When things do not go to plan with research
01:01:04 Question on architecture
01:01:43 SWaV Results
01:06:26 Reproducing Matilde's code
01:14:51 Do we need the whole dataset to set clustering loss
01:16:40 Self-supervised learning and transfer learning
01:23:25 Link to attention mechanism)
01:24:41 Sayak final thought why unsupervised better
01:25:56 Outro
Abstract;
"Unsupervised image representations have significantly reduced the gap with supervised pretraining, notably with the recent achievements of contrastive learning methods. These contrastive methods typically work online and rely on a large number of explicit pairwise feature comparisons, which is computationally challenging. In this paper, we propose an online algorithm, SwAV, that takes advantage of contrastive methods without requiring to compute pairwise comparisons. Specifically, our method simultaneously clusters the data while enforcing consistency between cluster assignments produced for different augmentations (or “views”) of the same image, instead of comparing features directly as in contrastive learning. Simply put, we use a “swapped” prediction mechanism where we predict the cluster assignment of a view from the representation of another view. Our method can be trained with large and small batches and can scale to unlimited amounts of data. Compared to previous contrastive methods, our method is more memory efficient since it does not require a large memory bank or a special momentum network. In addition, we also propose a new data augmentation strategy, multi-crop, that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much. We validate our findings by achieving 75.3% top-1 accuracy on ImageNet with ResNet-50, as well as surpassing supervised pretraining on all the considered transfer tasks."
This week Dr. Tim Scarfe, Dr. Keith Duggar and Yannic "Lightspeed" Kilcher respond to the "Algoshambles" exam fiasco in the UK where the government were forced to step in to standardise the grades which were grossly inflated by the schools. The schools and teachers are all paid on metrics related to the grades received by students, what could possibly go wrong?! The result is that we end up with grades which have lost all their value and students are coached for the exams and don't actually learn the subject. We also cover the second Francois Chollet interview on the Lex Fridman podcast. We cover GPT-3, Neuralink, and discussion of intelligence.
00:00:00 Algoshambles
00:45:40 Lex Fridman/Chollet: Intro
00:55:21 Lex Fridman/Chollet: Neuralink
01:06:28 Lex Fridman/Chollet: GPT-3
01:23:43 Lex Fridman/Chollet: Intelligence discussion
This week we spoke with Sayak Paul, who is extremely active in the machine learning community. We discussed the AI landscape in India, unsupervised representation learning, data augmentation and contrastive learning, explainability, abstract scene representations and finally pruning and the recent super positions paper. I really enjoyed this conversation and I hope you folks do too!
00:00:00 Intro to Sayak
00:17:50 AI landscape in India
00:24:20 Unsupervised representation learning
00:26:11 DATA AUGMENTATION/Contrastive learning
00:59:20 EXPLAINABILITY
01:12:10 ABSTRACT SCENE REPRESENTATIONS
01:14:50 PRUNING and super position paper
We speak with Robert Lange!
Robert is a PhD student at the Technical University Berlin. His research combines Deep Multi-Agent Reinforcement Learning and Cognitive Science to study the learning dynamics of large collectives. He has a brilliant blog where he distils and explains cutting edge ML research. We spoke about his story, economics, multi-agent RL, intelligence and AGI, and his recent article summarising the state of the art in neural network pruning.
Robert's article on pruning in NNs https://roberttlange.github.io/posts/2020/06/lottery-ticket-hypothesis/
00:00:00 Intro
00:04:17 Show start and intro to Robert
00:11:39 Economics background
00:27:20 Intrinsic motivation
00:33:22 Intelligence/consciousness
00:48:16 Lottery ticket/pruning article discussion
01:43:21 Robert's advice for younger self and state of deep learning
Robert's LinkedIn: https://www.linkedin.com/in/robert-tjarko-lange-19539a12a/
@RobertTLange
#machinelearning #deeplearning
We welcome Zak Jost from the WelcomeAIOverlords channel. Zak is an ML research scientist at Amazon. He has a great blog at http://blog.zakjost.com and also a Discord channel at https://discord.gg/xh2chKX
WelcomeAIOverlords: https://www.youtube.com/channel/UCxw9_WYmLqlj5PyXu2AWU_g
00:00:00 INTRO START
00:01:07 MAIN SHOW START
00:01:59 ZAK'S STORY
00:05:06 YOUTUBE DISCUSSION
00:24:12 UNDERSTANDING PAPERS
00:29:53 CONTRASTIVE LEARNING INTRO
00:33:00 BRING YOUR OWN LATENT PAPER
01:03:13 GRAPHS IN ML AND KNOWLEDGE GRAPHS
01:21:36 GRAPH USE CASES - FRAUD
01:30:15 KNOWLEDGE GRAPHS
01:34:22 GRAPHS IN ML
01:38:53 AUTOMATED ML
01:57:32 OUTRO
In this episode of Machine Learning Street Talk Dr. Tim Scarfe, Yannic Kilcher and Connor Shorten spoke with Marie-Anne Lachaux, Baptiste Roziere and Dr. Guillaume Lample from Facebook Research (FAIR) in Paris. They recently released the paper "Unsupervised Translation of Programming Languages" which was an exciting new approach to learned translation of programming languages (learned transcoder) using an unsupervised encoder trained on individual monolingual corpora i.e. no parallel language data needed. The trick they used what that there is significant token overlap when using word-piece embeddings. It was incredible to talk with this talented group of researchers and I hope you enjoy the conversation too.
Yannic's video on this got watched over 120K times! Check it out too https://www.youtube.com/watch?v=xTzFJIknh7E
Paper https://arxiv.org/abs/2006.03511;
Marie-Anne Lachaux, Baptiste Roziere, Lowik Chanussot, Guillaume Lample
Abstract;
"A transcompiler, also known as source-to-source translator, is a system that converts source code from a high-level programming language (such as C++ or Python) to another. Transcompilers are primarily used for interoperability, and to port codebases written in an obsolete or deprecated language (e.g. COBOL, Python 2) to a modern one. They typically rely on handcrafted rewrite rules, applied to the source code abstract syntax tree. Unfortunately, the resulting translations often lack readability, fail to respect the target language conventions, and require manual modifications in order to work properly. The overall translation process is timeconsuming and requires expertise in both the source and target languages, making code-translation projects expensive. Although neural models significantly outperform their rule-based counterparts in the context of natural language translation, their applications to transcompilation have been limited due to the scarcity of parallel data in this domain. In this paper, we propose to leverage recent approaches in unsupervised machine translation to train a fully unsupervised neural transcompiler. We train our model on source code from open source GitHub projects, and show that it can translate functions between C++, Java, and Python with high accuracy. Our method relies exclusively on monolingual source code, requires no expertise in the source or target languages, and can easily be generalized to other programming languages. We also build and release a test set composed of 852 parallel functions, along with unit tests to check the correctness of translations. We show that our model outperforms rule-based commercial baselines by a significant margin."
We cover Francois Chollet's recent paper.
Abstract; To make deliberate progress towards more intelligent and more human-like artificial systems, we need to be following an appropriate feedback signal: we need to be able to define and evaluate intelligence in a way that enables comparisons between two systems, as well as comparisons with humans. Over the past hundred years, there has been an abundance of attempts to define and measure intelligence, across both the fields of psychology and AI. We summarize and critically assess these definitions and evaluation approaches, while making apparent the two historical conceptions of intelligence that have implicitly guided them. We note that in practice, the contemporary AI community still gravitates towards benchmarking intelligence by comparing the skill exhibited by AIs and humans at specific tasks such as board games and video games. We argue that solely measuring skill at any given task falls short of measuring intelligence, because skill is heavily modulated by prior knowledge and experience: unlimited priors or unlimited training data allow experimenters to "buy" arbitrary levels of skills for a system, in a way that masks the system's own generalization power. We then articulate a new formal definition of intelligence based on Algorithmic Information Theory, describing intelligence as skill-acquisition efficiency and highlighting the concepts of scope, generalization difficulty, priors, and experience. Using this definition, we propose a set of guidelines for what a general AI benchmark should look like. Finally, we present a benchmark closely following these guidelines, the Abstraction and Reasoning Corpus (ARC), built upon an explicit set of priors designed to be as close as possible to innate human priors. We argue that ARC can be used to measure a human-like form of general fluid intelligence and that it enables fair general intelligence comparisons between AI systems and humans.
In this episode of Machine Learning Street Talk, Tim Scarfe, Yannic Kilcher and Connor Shorten discuss their takeaways from OpenAI’s GPT-3 language model. With the help of Microsoft’s ZeRO-2 / DeepSpeed optimiser, OpenAI trained an 175 BILLION parameter autoregressive language model. The paper demonstrates how self-supervised language modelling at this scale can perform many downstream tasks without fine-tuning.
00:00:00 Intro
00:00:54 ZeRO1+2 (model + Data parallelism) (Connor)
00:03:17 Recent history of NLP (Tim)
00:06:04 Yannic "Light-speed" Kilcher's brief overview of GPT-3
00:14:25 Reviewing Yannic's YT comments on his GPT-3 video (Tim)
00:20:26 Main show intro
00:23:03 Is GPT-3 reasoning?
00:28:15 Architecture discussion and autoregressive (GPT*) vs denoising autoencoder (BERT)
00:36:18 Utility of GPT-3 in industry
00:43:03 Can GPT-3 do math? (reasoning/system 1/system 2)
00:51:03 Generalisation
00:56:48 Esoterics of language models
00:58:46 Architectural trade-offs
01:07:37 Memorization machines and intepretability
01:17:16 Nearest neighbour probes / watermarks
01:20:03 YouTube comments on GPT-3 video
01:21:50 GPT-3 news article generation issue
01:27:36 Sampling data for language models / bias / fairness / politics
01:51:12 Outro
These paradigms of task adaptation are divided into zero, one, and few shot learning. Zero-shot learning is a very extreme case where we expect a language model to perform a task such as sentiment classification or extractive question answering, without any additional supervision. One and Few-shot learning provide some examples to the model. However, GPT-3s definition of this diverges a bit from the conventional literature. GPT-3 provides one and few-shot examples in the form of “In-Context Learning”. Instead of fine-tuning the model on a few examples, the model has to use the input to infer the downstream task. For example, the GPT-3 transformer has an input sequence of 2048 tokens, so demonstrations of a task such as yelp sentiment reviews, would have to fit in this input sequence as well as the new review.
Thanks for watching! Please Subscribe!
Paper Links:
GPT-3: https://arxiv.org/abs/2005.14165
ZeRO: https://arxiv.org/abs/1910.02054
ZeRO (Blog Post): https://www.microsoft.com/en-us/research/blog/zero-deepspeed-new-system-optimizations-enable-training-models-with-over-100-billion-parameters/
ZeRO-2 (Blog Post): https://www.microsoft.com/en-us/research/blog/zero-2-deepspeed-shattering-barriers-of-deep-learning-speed-scale/?OCID=msr_blog_deepspeed2_build_tw
#machinelearning #naturallanguageprocessing #deeplearning #gpt3
This week we had a super insightful conversation with Jordan Edwards, Principal Program Manager for the AzureML team! Jordan is on the coalface of turning machine learning software engineering into a reality for some of Microsoft's largest customers.
ML DevOps is all about increasing the velocity of- and orchastrating the non-interactive phase of- software deployments for ML. We cover ML DevOps and Microsoft Azure ML. We discuss model governance, testing, intepretability, tooling. We cover the age-old discussion of the dichotomy between science and engineering and how you can bridge the gap with ML DevOps. We cover Jordan's maturity model for ML DevOps.
We also cover off some of the exciting ML announcments from the recent Microsoft Build conference i.e. FairLearn, IntepretML, SEAL, WhiteNoise, OpenAI code generation, OpenAI GPT-3.
00:00:04 Introduction to ML DevOps and Microsoft Build ML Announcements
00:10:29 Main show kick-off
00:11:06 Jordan's story
00:14:36 Typical ML DevOps workflow
00:17:38 Tim's articulation of ML DevOps
00:19:31 Intepretability / Fairness
00:24:31 Testing / Robustness
00:28:10 Using GANs to generate testing data
00:30:26 Gratuitous DL?
00:33:46 Challenges of making an ML DevOps framework / IaaS
00:38:48 Cultural battles in ML DevOps
00:43:04 Maturity Model for Ml DevOps
00:49:19 ML: High interest credit card of technical debt paper
00:50:19 ML Engineering at Microsoft
01:01:20 ML Flow
01:03:05 Company-wide governance
01:08:15 What's coming next
01:12:10 Jordan's hillarious piece of advice for his younger self
Super happy with how this turned out, this is not one to miss folks!
#deeplearning #machinelearning #devops #mldevops
*Note this is an episode from Tim's Machine Learning Dojo YouTube channel.
Join Eric Craeymeersch on a wonderful discussion all about ML engineering, computer vision, siamese networks, contrastive loss, one shot learning and metric learning.
00:00:00 Introduction
00:11:47 ML Engineering Discussion
00:35:59 Intro to the main topic
00:42:13 Siamese Networks
00:48:36 Mining strategies
00:51:15 Contrastive Loss
00:57:44 Trip loss paper
01:09:35 Quad loss paper
01:25:49 Eric's Quadloss Medium Article
02:17:32 Metric learning reality check
02:21:06 Engineering discussion II
02:26:22 Outro
In our second paper review call, Tess Ferrandez covered off the FaceNet paper from Google which was a one-shot siamese network with the so called triplet loss. It was an interesting change of direction for NN architecture i.e. using a contrastive loss instead of having a fixed number of output classes. Contrastive architectures have been taking over the ML landscape recently i.e. SimCLR, MOCO, BERT.
Eric wrote an article about this at the time: https://medium.com/@crimy/one-shot-learning-siamese-networks-and-triplet-loss-with-keras-2885ed022352
He then discovered there was a new approach to one shot learning in vision using a quadruplet loss and metric learning. Eric wrote a new article and several experiments on this @ https://medium.com/@crimy/beyond-triplet-loss-one-shot-learning-experiments-with-quadruplet-loss-16671ed51290?source=friends_link&sk=bf41673664ad8a52e322380f2a456e8b
Paper details:
Beyond triplet loss: a deep quadruplet network for person re-identification
https://arxiv.org/abs/1704.01719 (Chen at al '17)
"Person re-identification (ReID) is an important task in wide area video surveillance which focuses on identifying people across different cameras. Recently, deep learning networks with a triplet loss become a common framework for person ReID. However, the triplet loss pays main attentions on obtaining correct orders on the training set. It still suffers from a weaker generalization capability from the training set to the testing set, thus resulting in inferior performance. In this paper, we design a quadruplet loss, which can lead to the model output with a larger inter-class variation and a smaller intra-class variation compared to the triplet loss. As a result, our model has a better generalization ability and can achieve a higher performance on the testing set. In particular, a quadruplet deep network using a margin-based online hard negative mining is proposed based on the quadruplet loss for the person ReID. In extensive experiments, the proposed network outperforms most of the state-of-the-art algorithms on representative datasets which clearly demonstrates the effectiveness of our proposed method."
Original facenet paper;
https://arxiv.org/abs/1503.03832
#deeplearning #machinelearning
In this episode of Machine Learning Street Talk, Tim Scarfe, Yannic Kilcher and Connor Shorten interviewed Harri Valpola, CEO and Founder of Curious AI. We continued our discussion of System 1 and System 2 thinking in Deep Learning, as well as miscellaneous topics around Model-based Reinforcement Learning. Dr. Valpola describes some of the challenges of modelling industrial control processes such as water sewage filters and paper mills with the use of model-based RL. Dr. Valpola and his collaborators recently published “Regularizing Trajectory Optimization with Denoising Autoencoders” that addresses some of the concerns of planning algorithms that exploit inaccuracies in their world models!
00:00:00 Intro to Harri and Curious AI System1/System 2
00:04:50 Background on model-based RL challenges from Tim
00:06:26 Other interesting research papers on model-based RL from Connor
00:08:36 Intro to Curious AI recent NeurIPS paper on model-based RL and denoising autoencoders from Yannic
00:21:00 Main show kick off, system 1/2
00:31:50 Where does the simulator come from?
00:33:59 Evolutionary priors
00:37:17 Consciousness
00:40:37 How does one build a company like Curious AI?
00:46:42 Deep Q Networks
00:49:04 Planning and Model based RL
00:53:04 Learning good representations
00:55:55 Typical problem Curious AI might solve in industry
01:00:56 Exploration
01:08:00 Their paper - regularizing trajectory optimization with denoising
01:13:47 What is Epistemic uncertainty
01:16:44 How would Curious develop these models
01:18:00 Explainability and simulations
01:22:33 How system 2 works in humans
01:26:11 Planning
01:27:04 Advice for starting an AI company
01:31:31 Real world implementation of planning models
01:33:49 Publishing research and openness
We really hope you enjoy this episode, please subscribe!
Regularizing Trajectory Optimization with Denoising Autoencoders: https://papers.nips.cc/paper/8552-regularizing-trajectory-optimization-with-denoising-autoencoders.pdf
Pulp, Paper & Packaging: A Future Transformed through Deep Learning: https://thecuriousaicompany.com/pulp-paper-packaging-a-future-transformed-through-deep-learning/
Curious AI: https://thecuriousaicompany.com/
Harri Valpola Publications: https://scholar.google.com/citations?user=1uT7-84AAAAJ&hl=en&oi=ao
Some interesting papers around Model-Based RL:
GameGAN: https://cdn.arstechnica.net/wp-content/uploads/2020/05/Nvidia_GameGAN_Research.pdf
Plan2Explore: https://ramanans1.github.io/plan2explore/
World Models: https://worldmodels.github.io/
MuZero: https://arxiv.org/pdf/1911.08265.pdf
PlaNet: A Deep Planning Network for RL: https://ai.googleblog.com/2019/02/introducing-planet-deep-planning.html
Dreamer: Scalable RL using World Models: https://ai.googleblog.com/2020/03/introducing-dreamer-scalable.html
Model Based RL for Atari: https://arxiv.org/pdf/1903.00374.pdf
In this episode of Machine Learning Street Talk, Tim Scarfe, Connor Shorten and Yannic Kilcher react to Yoshua Bengio’s ICLR 2020 Keynote “Deep Learning Priors Associated with Conscious Processing”. Bengio takes on many future directions for research in Deep Learning such as the role of attention in consciousness, sparse factor graphs and causality, and the study of systematic generalization. Bengio also presents big ideas in Intelligence that border on the line of philosophy and practical machine learning. This includes ideas such as consciousness in machines and System 1 and System 2 thinking, as described in Daniel Kahneman’s book “Thinking Fast and Slow”. Similar to Yann LeCun’s half of the 2020 ICLR keynote, this talk takes on many challenging ideas and hopefully this video helps you get a better understanding of some of them! Thanks for watching!
Please Subscribe for more videos!
Paper Links:
Link to Talk: https://iclr.cc/virtual_2020/speaker_7.html
The Consciousness Prior: https://arxiv.org/abs/1709.08568
Thinking Fast and Slow: https://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555
Systematic Generalization: https://arxiv.org/abs/1811.12889
CLOSURE: Assessing Systematic Generalization of CLEVR Models: https://arxiv.org/abs/1912.05783
Neural Module Networks: https://arxiv.org/abs/1511.02799
Experience Grounds Language: https://arxiv.org/pdf/2004.10151.pdf
Benchmarking Graph Neural Networks: https://arxiv.org/pdf/2003.00982.pdf
On the Measure of Intelligence: https://arxiv.org/abs/1911.01547
Please check out our individual channels as well!
Machine Learning Dojo with Tim Scarfe: https://www.youtube.com/channel/UCXvHuBMbgJw67i5vrMBBobA
Yannic Kilcher: https://www.youtube.com/channel/UCZHmQk67mSJgfCCTn7xBfe
Henry AI Labs: https://www.youtube.com/channel/UCHB9VepY6kYvZjj0Bgxnpbw
00:00:00 Tim and Yannics takes
00:01:37 Intro to Bengio
00:03:13 System 2, language and Chomsky
00:05:58 Cristof Koch on conciousness
00:07:25 Francois Chollet on intelligence and consciousness
00:09:29 Meditation and Sam Harris on consciousness
00:11:35 Connor Intro
00:13:20 Show Main Intro
00:17:55 Priors associated with Conscious Processing
00:26:25 System 1 / System 2
00:42:47 Implicit and Verbalized Knowledge [DONT MISS THIS!]
01:08:24 Inductive Priors for DL 2.0
01:27:20 Systematic Generalization
01:37:53 Contrast with the Symbolic AI Program
01:54:55 Attention
02:00:25 From Attention to Consciousness
02:05:31 Thoughts, Consciousness, Language
02:06:55 Sparse Factor graph
02:10:52 Sparse Change in Abstract Latent Space
02:15:10 Discovering Cause and Effect
02:20:00 Factorize the joint distribution
02:22:30 RIMS: Modular Computation
02:24:30 Conclusion
#machinelearning #deeplearning
This week Connor Shorten, Yannic Kilcher and Tim Scarfe reacted to Yann LeCun's keynote speech at this year's ICLR conference which just passed. ICLR is the number two ML conference and was completely open this year, with all the sessions publicly accessible via the internet. Yann spent most of his talk speaking about self-supervised learning, Energy-based models (EBMs) and manifold learning. Don't worry if you hadn't heard of EBMs before, neither had we!
Thanks for watching! Please Subscribe!
Paper Links:
ICLR 2020 Keynote Talk: https://iclr.cc/virtual_2020/speaker_7.html
A Tutorial on Energy-Based Learning: http://yann.lecun.com/exdb/publis/pdf/lecun-06.pdf
Concept Learning with Energy-Based Models (Yannic's Explanation): https://www.youtube.com/watch?v=Cs_j-oNwGgg
Concept Learning with Energy-Based Models (Paper): https://arxiv.org/pdf/1811.02486.pdf
Concept Learning with Energy-Based Models (OpenAI Blog Post): https://openai.com/blog/learning-concepts-with-energy-functions/
#deeplearning #machinelearning #iclr #iclr2020 #yannlecun
In this episode of Machine Learning Street Talk, we chat with Jonathan Frankle, author of The Lottery Ticket Hypothesis. Frankle has continued researching Sparse Neural Networks, Pruning, and Lottery Tickets leading to some really exciting follow-on papers! This chat discusses some of these papers such as Linear Mode Connectivity, Comparing and Rewinding and Fine-tuning in Neural Network Pruning, and more (full list of papers linked below). We also chat about how Jonathan got into Deep Learning research, his Information Diet, and work on developing Technology Policy for Artificial Intelligence!
This was a really fun chat, I hope you enjoy listening to it and learn something from it!
Thanks for watching and please subscribe!
Huge thanks to everyone on r/MachineLearning who asked questions!
Paper Links discussed in the chat:
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks: https://arxiv.org/abs/1803.03635
Linear Mode Connectivity and the Lottery Ticket Hypothesis: https://arxiv.org/abs/1912.05671
Dissecting Pruned Neural Networks: https://arxiv.org/abs/1907.00262
Training BatchNorm and Only BatchNorm: On the Expressive Power of Random Features in CNNs: https://arxiv.org/abs/2003.00152
What is the State of Neural Network Pruning? https://arxiv.org/abs/2003.03033
The Early Phase of Neural Network Training: https://arxiv.org/abs/2002.10365
Comparing Rewinding and Fine-tuning in Neural Network Pruning: https://arxiv.org/abs/2003.02389
(Also Mentioned)
Block-Sparse GPU Kernels: https://openai.com/blog/block-sparse-gpu-kernels/
Balanced Sparsity for Efficient DNN Inference on GPU: https://arxiv.org/pdf/1811.00206.pdf
Playing the Lottery with Rewards and Multiple Languages: Lottery Tickets in RL and NLP: https://arxiv.org/pdf/1906.02768.pdf
r/MachineLearning question list: https://www.reddit.com/r/MachineLearning/comments/g9jqe0/d_lottery_ticket_hypothesis_ask_the_author_a/ (edited)
#machinelearning #deeplearning
In this episode of Machine Learning Street Talk, Tim Scarfe, Yannic Kilcher and Connor Shorten chat about Large-scale Transfer Learning in Natural Language Processing. The Text-to-Text Transfer Transformer (T5) model from Google AI does an exhaustive survey of what’s important for Transfer Learning in NLP and what’s not. In this conversation, we go through the key takeaways of the paper, text-to-text input/output format, architecture choice, dataset size and composition, fine-tuning strategy, and how to best use more computation.
Beginning with these topics, we diverge into exciting ideas such as embodied cognition, meta-learning, and the measure of intelligence. We are still beginning our podcast journey and really appreciate any feedback from our listeners. Is the chat too technical? Do you prefer group discussions, interviewing experts, or chats between the three of us? Thanks for watching and if you haven’t already, Please Subscribe!
Paper Links discussed in the chat:
Text-to-Text Transfer Transformer: https://arxiv.org/abs/1910.10683
Experience Grounds Language (relevant to divergent discussion about embodied cognition): https://arxiv.org/pdf/2004.10151.pdf
On the Measure of Intelligence: https://arxiv.org/abs/1911.01547
Train Large, Then Compress: https://arxiv.org/pdf/2002.11794.pdf
Scaling Laws for Neural Language Models: https://arxiv.org/pdf/2001.08361.pdf
The Illustrated Transformer: http://jalammar.github.io/illustrated...
ELECTRA: https://arxiv.org/pdf/2003.10555.pdf
Transformer-XL: https://arxiv.org/pdf/1901.02860.pdf
Reformer: The Efficient Transformer: https://openreview.net/pdf?id=rkgNKkHtvB
The Evolved Transformer: https://arxiv.org/pdf/1901.11117.pdf
DistilBERT: https://arxiv.org/pdf/1910.01108.pdf
How to generate text (HIGHLY RECOMMEND): https://huggingface.co/blog/how-to-ge...
Tokenizers: https://blog.floydhub.com/tokenization-nlp/
According to Yann Le Cun, the next big thing in machine learning is unsupervised learning. Self-supervision has changed the entire game in the last few years in deep learning, first transforming the language world with word2vec and BERT -- but now it's turning computer vision upside down.
This week Yannic, Connor and I spoke with one of the authors, Aravind Srinivas who recently co-led the hot-off-the-press CURL: Contrastive Unsupervised Representations for Reinforcement Learning alongside Michael (Misha) Laskin. CURL has had an incredible reception in the ML community in the last month or so. Remember the Deep Mind paper which solved the Atari games using the raw pixels? Aravind's approach uses contrastive unsupervised learning to featurise the pixels before applying RL. CURL is the first image-based algorithm to nearly match the sample-efficiency and performance of methods that use state-based features! This is a huge step forwards in being able to apply RL in the real world.
We explore RL and self-supervision for computer vision in detail and find out about how Aravind got into machine learning.
Original YouTube Video: https://youtu.be/1MprzvYNpY8
Paper:
CURL: Contrastive Unsupervised Representations for Reinforcement Learning
Aravind Srinivas, Michael Laskin, Pieter Abbeel
https://arxiv.org/pdf/2004.04136.pdf
Yannic's analysis video: https://www.youtube.com/watch?v=hg2Q_O5b9w4
#machinelearning #reinforcementlearning #curl #timscarfe #yannickilcher #connorshorten
Music credit; https://soundcloud.com/errxrmusic/in-my-mind
Three YouTubers; Tim Scarfe - Machine Learning Dojo (https://www.youtube.com/channel/UCXvHuBMbgJw67i5vrMBBobA), Connor Shorten - Henry AI Labs (https://www.youtube.com/channel/UCHB9VepY6kYvZjj0Bgxnpbw) and Yannic Kilcher (https://www.youtube.com/channel/UCZHmQk67mSJgfCCTn7xBfew). We made a new YouTube channel called Machine Learning Street Talk. Every week we will talk about the latest and greatest in AI. Subscribe now!
Special guests this week; Dr. Mathew Salvaris (https://www.linkedin.com/in/drmathewsalvaris/), Eric Craeymeersch (https://www.linkedin.com/in/ericcraeymeersch/), Dr. Keith Duggar (https://www.linkedin.com/in/dr-keith-duggar/), Dmitri Soshnikov (https://www.linkedin.com/in/shwars/)
We discuss the new concept of an open-ended, or "AI-Generating" algorithm. Open-endedness is a class of algorithms which generate problems and solutions to increasingly complex and diverse tasks. These algorithms create their own curriculum of learning. Complex tasks become tractable because they are now the final stepping stone in a lineage of progressions. In many respects, it's better to trust the machine to develop the learning curriculum, because the best curriculum might be counter-intuitive. These algorithms can generate a radiating tree of evolving challenges and solutions just like natural evolution. Evolution has produced an eternity of diversity and complexity and even produced human intelligence as a side-effect! Could AI-generating algorithms be the next big thing in machine learning?
Wang, Rui, et al. "Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions." arXiv preprint arXiv:2003.08536 (2020). https://arxiv.org/abs/2003.08536
Wang, Rui, et al. "Paired open-ended trailblazer (poet): Endlessly generating increasingly complex and diverse learning environments and their solutions." arXiv preprint arXiv:1901.01753 (2019). https://arxiv.org/abs/1901.01753
Watch Yannic’s video on POET: https://www.youtube.com/watch?v=8wkgDnNxiVs
and on the extended POET: https://youtu.be/gbG1X8Xq-T8
Watch Connor’s video https://www.youtube.com/watch?v=jxIkPxkN10U
UberAI labs video: https://www.youtube.com/watch?v=RX0sKDRq400
#reinforcementlearning #machinelearning #uber #deeplearning #rl #timscarfe #connorshorten #yannickilcher
En liten tjänst av I'm With Friends. Finns även på engelska.