Podd: Deep Papers

LibreEval: The Largest Open Source Benchmark for RAG Hallucination Detection

18 april 2025 | 27 min

AI Benchmark Deep Dive: Gemini 2.5 and Humanity's Last Exam

4 april 2025 | 26 min

Model Context Protocol (MCP)

25 mars 2025 | 15 min

AI Roundup: DeepSeek’s Big Moves, Claude 3.7, and the Latest Breakthroughs

1 mars 2025 | 30 min

How DeepSeek is Pushing the Boundaries of AI Development

21 februari 2025 | 30 min

Multiagent Finetuning: A Conversation with Researcher Yilun Du

4 februari 2025 | 30 min

Training Large Language Models to Reason in Continuous Latent Space

14 januari 2025 | 25 min

LLMs as Judges: A Comprehensive Survey on LLM-Based Evaluation Methods

23 december 2024 | 29 min

Merge, Ensemble, and Cooperate! A Survey on Collaborative LLM Strategies

10 december 2024 | 29 min

Agent-as-a-Judge: Evaluate Agents with Agents

23 november 2024 | 25 min

Introduction to OpenAI's Realtime API

12 november 2024 | 30 min

Swarm: OpenAI's Experimental Approach to Multi-Agent Systems

29 oktober 2024 | 47 min

KV Cache Explained

24 oktober 2024 | 4 min

The Shrek Sampler: How Entropy-Based Sampling is Revolutionizing LLMs

16 oktober 2024 | 4 min

Google's NotebookLM and the Future of AI-Generated Audio

15 oktober 2024 | 43 min

Exploring OpenAI's o1-preview and o1-mini

27 september 2024 | 42 min

Breaking Down Reflection Tuning: Enhancing LLM Performance with Self-Learning

19 september 2024 | 27 min

Composable Interventions for Language Models

11 september 2024 | 43 min

Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges

16 augusti 2024 | 39 min

Breaking Down Meta's Llama 3 Herd of Models

6 augusti 2024 | 45 min

DSPy Assertions: Computational Constraints for Self-Refining Language Model Pipelines

23 juli 2024 | 34 min

RAFT: Adapting Language Model to Domain Specific RAG

28 juni 2024 | 44 min

LLM Interpretability and Sparse Autoencoders: Research from OpenAI and Anthropic

14 juni 2024 | 44 min

Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models' Alignment

30 maj 2024 | 48 min

Breaking Down EvalGen: Who Validates the Validators?

13 maj 2024 | 45 min

Keys To Understanding ReAct: Synergizing Reasoning and Acting in Language Models

26 april 2024 | 45 min

Demystifying Chronos: Learning the Language of Time Series

4 april 2024 | 45 min

Anthropic Claude 3

25 mars 2024 | 43 min

Reinforcement Learning in the Era of LLMs

15 mars 2024 | 45 min

Sora: OpenAI’s Text-to-Video Generation Model

1 mars 2024 | 45 min

RAG vs Fine-Tuning

8 februari 2024 | 40 min

HyDE: Precise Zero-Shot Dense Retrieval without Relevance Labels

2 februari 2024 | 36 min

Phi-2 Model

2 februari 2024 | 44 min

A Deep Dive Into Generative's Newest Models: Gemini vs Mistral (Mixtral-8x7B)–Part I

27 december 2023 | 48 min

How to Prompt LLMs for Text-to-SQL: A Study in Zero-shot, Single-domain, and Cross-domain Settings

18 december 2023 | 45 min

The Geometry of Truth: Emergent Linear Structure in LLM Representation of True/False Datasets

30 november 2023 | 41 min

Towards Monosemanticity: Decomposing Language Models With Dictionary Learning

20 november 2023 | 45 min

RankVicuna: Zero-Shot Listwise Document Reranking with Open-Source Large Language Models

18 oktober 2023 | 44 min

Explaining Grokking Through Circuit Efficiency

17 oktober 2023 | 36 min

Large Content And Behavior Models To Understand, Simulate, And Optimize Content And Behavior

29 september 2023 | 42 min

Skeleton of Thought: LLMs Can Do Parallel Decoding

30 augusti 2023 | 44 min

Llama 2: Open Foundation and Fine-Tuned Chat Models

31 juli 2023 | 30 min

Lost in the Middle: How Language Models Use Long Contexts

26 juli 2023 | 42 min

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

21 juli 2023 | 42 min

Toolformer: Training LLMs To Use Tools

20 mars 2023 | 34 min

Hungry Hungry Hippos - H3

13 februari 2023 | 42 min

ChatGPT and InstructGPT: Aligning Language Models to Human Intention

18 januari 2023 | 48 min

Deep Papers

Deep Papers is a podcast series featuring deep dives on today’s most important AI papers and research.

Om podden

Avsnitt