Podd: Agentic Horizons

AI Storytelling with DOME

19 februari 2025 | 15 min

Intelligence Explosion Microeconomics

18 februari 2025 | 18 min

Metacognitive Monitoring: A Human Ability Beyond AI

17 februari 2025 | 7 min

Building Living Software Systems with Generative & Agentic AI

16 februari 2025 | 12 min

Theory of Mind in LLMs

15 februari 2025 | 14 min

Designing AI Personalities

14 februari 2025 | 17 min

FISHNET: Financial Intelligence from Sub-querying, Harmonizing, Neural-Conditioning, Expert Swarms, and Task Planning

13 februari 2025 | 13 min

LLMs Know More Than They Show

12 februari 2025 | 16 min

PDL: A Declarative Prompt Programming Language

11 februari 2025 | 16 min

AI Self-Evolution Using Long Term Memory

10 februari 2025 | 23 min

Responsibility in a Multi-Value Strategic Setting

9 februari 2025 | 16 min

API-Based Web Agents

8 februari 2025 | 15 min

GUS-Net: Social Bias Classification with Generalizations, Unfairness, and Stereotypes

7 februari 2025 | 10 min

Google DeedMind's Talker-Reasoner Architecture

6 februari 2025 | 10 min

A Framework for Representing Knowledge

5 februari 2025 | 17 min

RAG-ConfusionQA: A Benchmark for Evaluating LLMs on Confusing Questions

4 februari 2025 | 10 min

Do LLMs Estimate Uncertainty Well?

3 februari 2025 | 7 min

Stars, Stripes, and Silicon: Unravelling ChatGPT’s Bias

2 februari 2025 | 9 min

Debug Smarter, Not Harder: AI Agents for Error Resolution in Computational Notebooks

1 februari 2025 | 8 min

Interpretable End-to-end Neurosymbolic Reinforcement Learning Agents

31 januari 2025 | 8 min

Situations, Actions, and Causal Laws

30 januari 2025 | 9 min

Programs with Common Sense

29 januari 2025 | 9 min

A Simulation System Towards Solving Societal-Scale Manipulation

28 januari 2025 | 7 min

Good Parenting is All You Need

27 januari 2025 | 14 min

On Computable Numbers, with an Application to the Entscheidungsproblem

26 januari 2025 | 12 min

A Path Towards Autonomous Machine Intelligence

25 januari 2025 | 6 min

The Dartmouth Summer Research Project on Artificial Intelligence

24 januari 2025 | 12 min

Stanford University's One Hundred Year Study on Artificial Intelligence

23 januari 2025 | 13 min

Computing Machinery and Intelligence

22 januari 2025 | 15 min

Steps Toward Artificial Intelligence

21 januari 2025 | 14 min

Building Machines That Learn and Think Like People

20 januari 2025 | 17 min

Alloy Design with Graph Neural Network-Powered LLM-Driven Multi-Agent Systems

19 januari 2025 | 9 min

SchizophreniaInfoBot and the Critical Analysis Filter

18 januari 2025 | 9 min

Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks

17 januari 2025 | 15 min

SynapticRAG: Temporal Dynamic Memory

16 januari 2025 | 9 min

AgentRefine: Enhancing Agent Generalization Through Refinement Tuning

15 januari 2025 | 18 min

Why Agents Are Stupid & What We Can Do About It

14 januari 2025 | 22 min

Towards Efficient AI Policymaking in Economic Simulations

13 januari 2025 | 9 min

Unlocking Abstract Reasoning: How AI Solves Complex Puzzles with Offline Reinforcement Learning

12 januari 2025 | 12 min

CORY: Cooperative Agents for Smarter AI Fine-Tuning

11 januari 2025 | 8 min

SecurityBot: Mentoring LLM with RL Agents to Master Cybersecurity Games

10 januari 2025 | 8 min

AI Consciousness and Global Workspace Theory

9 januari 2025 | 8 min

MAGIS: Multi-Agent Framework for GitHub Issue ReSolution

8 januari 2025 | 30 min

Hierarchical Cooperation Graph Learning

7 januari 2025 | 6 min

Prioritized Heterogeneous League Reinforcement Learning

6 januari 2025 | 10 min

Knowledge Boundary and Persona Dynamic Shape A Better Social Media Agent

5 januari 2025 | 11 min

ITCMA: Computational Consciousness

4 januari 2025 | 13 min

VIRSCI: A Multi-Agent System for Collaborative Scientific Discovery

3 januari 2025 | 8 min

Collaborative Capabilities of Language Models in Blocks World

2 januari 2025 | 9 min

Agent-as-a-Judge: Evaluate Agents with Agents

1 januari 2025 | 8 min

Mentigo: An Intelligent Agent for Mentoring Students in Creative Problem Solving

31 december 2024 | 8 min

Symbolic and Connectionist AI in Autonomous Agents

30 december 2024 | 9 min

AgentStudio: A Toolkit for Building General Virtual Agents

29 december 2024 | 11 min

FairMindSim: Alignment of Behavior, Emotion, and Belief Amid Ethical Dilemmas

28 december 2024 | 12 min

Machines of Loving Grace

27 december 2024 | 13 min

GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in LLMs

26 december 2024 | 12 min

MegaAgent: Autonomous Cooperation in Large-Scale LLM Agent Systems

25 december 2024 | 12 min

GEM-RAG: Mimicking Human Memory Processes

24 december 2024 | 6 min

Alignment Faking in Large Language Models

23 december 2024 | 14 min

DialSim: A New Approach to Evaluating Conversational AI

22 december 2024 | 12 min

LogicGame: Benchmarking Rule-Based Reasoning Abilities of LLMs

21 december 2024 | 7 min

AIOS: An Intelligent Agent Operating System

20 december 2024 | 9 min

Automating Insights: The Future of Data Storytelling with LLMs

19 december 2024 | 12 min

Socially-Minded Intelligence

18 december 2024 | 13 min

WebPilot: Mastering Complex Web Tasks

17 december 2024 | 9 min

Graph of Thoughts

16 december 2024 | 9 min

AgentGen: Automating Environment and Task Generation for Smarter AI Agents

15 december 2024 | 12 min

Agent-Based Modeling to Predict the Impact of Generative AI

14 december 2024 | 15 min

Reflective Monte Carlo Tree Search (R-MCTS)

13 december 2024 | 8 min

MLE-Bench: Evaluating AI Agents in Real-World Machine Learning Challenges

12 december 2024 | 10 min

Episodic Future Thinking

11 december 2024 | 15 min

EgoSocialArena: Measuring Theory of Mind and Socialization

10 december 2024 | 9 min

Conversate: Job Interview Preparation through Simulations and Feedback

9 december 2024 | 7 min

Efficient Literature Review Filtration

8 december 2024 | 7 min

AI-Press: Multi-Agent News Generation and Feedback Simulation

7 december 2024 | 11 min

Agent S: Using Computers Like Humans

6 december 2024 | 10 min

HyperAgent: Generalist Software Engineering Agents

5 december 2024 | 9 min

The Rise and Potential of LLM Based Agents: A Survey

4 december 2024 | 11 min

Situational Awareness: The Decade Ahead

3 december 2024 | 16 min

Retrieval Augmented Generation (RAG) and Beyond

2 december 2024 | 10 min

Improving Factuality and Reasoning through Multiagent Debate

1 december 2024 | 9 min

Multiagent Requirements Elicitation and Analysis

30 november 2024 | 5 min

Generative Agents: Interactive Simulacra of Human Behavior

29 november 2024 | 9 min

The Art of Storytelling: Dynamic Multimodal Narratives

28 november 2024 | 7 min

Tree of Thoughts

27 november 2024 | 11 min

PairCoder

26 november 2024 | 12 min

AI Morality

25 november 2024 | 8 min

Plurals: Simulated Social Ensembles

24 november 2024 | 10 min

LLM Persuasion Games

23 november 2024 | 10 min

Cooperative Resilience in Multi-Agent Systems

22 november 2024 | 11 min

Human-Like Memory Systems

21 november 2024 | 10 min

Ex3: Automatic Novel Writing

20 november 2024 | 7 min

Mental Models in Adaptive Dialog Agents

19 november 2024 | 9 min

Evolutionary Game Theory Analysis of Human-AI Populations

18 november 2024 | 11 min

Democracy Research with Generative Agents

17 november 2024 | 5 min

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

16 november 2024 | 9 min

Spontaneous Cooperation of Competing Agents

15 november 2024 | 17 min

Agent-E: Autonomous Web Navigation

14 november 2024 | 8 min

Strategist: Learning Strategy with Bi-Level Tree Search

13 november 2024 | 8 min

This episode focuses on STRATEGIST, a new method that uses Large Language Models (LLMs) to learn strategic skills in multi-agent games1. The core idea is to have LLMs acquire new skills through a self-improvement process, rather than relying on traditional methods like supervised learning or reinforcement learning.

• STRATEGIST aims to address the challenges of learning in adversarial environments where the optimal policy is constantly changing due to opponents' adaptive strategies.

• The method works by combining high-level strategy learning with low-level action planning. At the high level, the system constructs a "strategy tree" through an evolutionary process, refining previously learned strategies.

• This tree structure allows STRATEGIST to search and evaluate different strategies efficiently, eventually arriving at a good policy without needing parameter updates or fine-tuning.How STRATEGIST Learns:

• The learning process relies on simulated self-play to gather feedback. This involves using Monte Carlo tree search (MCTS) and LLM-based reflection to evaluate the effectiveness of different strategies.

• STRATEGIST employs a modular search method that further enhances sample efficiency. This involves two steps:

• Reflection and Idea Generation: The LLM reflects on the self-play feedback and generates ideas for improving the current strategy. These ideas are added to an "idea queue" for later evaluation.

• Strategy Improvement: The LLM selects a strategy from the strategy tree and an improvement idea from the queue, then uses this input to generate an improved version of the strategy. The improved strategy is then evaluated through more self-play simulations.

• This modular approach allows the system to isolate the effects of specific changes and determine which improvements are truly beneficial.

• The idea queue also serves as a memory of successful improvements, which can be transferred to other strategies within the same game.Key Findings:

• The experiments show that STRATEGIST outperforms several baseline LLM improvement methods, as well as traditional reinforcement learning approaches. This suggests that guided LLM improvement, informed by self-play feedback, can be highly effective for learning strategic skills.

• STRATEGIST is also more efficient in acquiring high-quality feedback compared to using an LLM-critic or relying on feedback from interactions with a fixed opponent policy. This highlights the advantage of learning to simulate opponent behavior through self-play.Limitations:

• The authors acknowledge that individual runs of STRATEGIST can have high variance due to the inherent noise of multi-agent adversarial environments and LLM generations. However, they suggest that running more game simulations can mitigate this issue.

• The researchers also note that STRATEGIST hasn't been tested in non-adversarial environments like question answering. However, given its success in complex adversarial settings, similar performance is expected in simpler scenarios.

Conclusion: STRATEGIST represents a promising new approach to LLM skill learning that combines self-improvement with modular search and simulated self-play feedback. The method demonstrates strong performance in challenging multi-agent games, outperforming traditional reinforcement learning and other LLM improvement baselines. The authors believe STRATEGIST's success stems from its ability to (1) effectively test and isolate the impact of specific improvements and (2) explore the strategy space more efficiently to avoid local optima.

https://arxiv.org/pdf/2408.10635

The AI Scientist: Automated Discovery

12 november 2024 | 11 min

AutoGen: A Multi-Agent Framework

11 november 2024 | 9 min

This episode discusses AutoGen, an open-source framework designed for building applications using large language models (LLMs). Unlike single-agent systems, AutoGen employs multiple agents that communicate and cooperate to solve complex tasks, offering enhanced capabilities and flexibility. The episode highlights the following key aspects:

• Conversable Agents: AutoGen's core strength lies in its customizable and conversable agents. These agents can be powered by LLMs, tools, or even human input, enabling diverse functionalities and adaptable behavior patterns. They communicate through message passing and maintain individual contexts based on past conversations.

• Conversation Programming: This innovative programming paradigm simplifies complex workflows by representing them as multi-agent conversations45. Developers define agents with specific roles and program their interaction behaviors using a combination of natural language and code.

• Unified Interfaces and Auto-Reply: AutoGen streamlines agent interaction with unified conversation interfaces. The auto-reply mechanism triggers automatic responses based on received messages, unless specified otherwise, further simplifying development.

• Control Flow Management: AutoGen offers flexible control flow using both natural language and code. LLM-backed agents can be guided with natural language prompts, while programmatic control allows developers to specify conditions, human input modes, and tool execution logic.Diverse Applications: The episode showcases AutoGen's versatility across various domains, including:

• Math Problem Solving: AutoGen builds systems for autonomous problem-solving, human-in-the-loop scenarios, and even collaborations involving multiple human users.

• Retrieval-Augmented Tasks: AutoGen facilitates retrieval-augmented code generation and question answering by integrating external data sources through a vector database. Notably, it introduces an "interactive retrieval" feature that iteratively refines context for improved accuracy.

• Decision Making in Text Environments: AutoGen tackles interactive decision-making tasks in simulated environments like ALFWorld, showcasing its capability in handling complex sequential actions.

• Multi-Agent Coding: AutoGen enhances coding applications by introducing safeguards, ensuring code safety, and reducing development effort.

• Dynamic Group Chat: AutoGen supports dynamic multi-agent conversations where participants collaborate without a predefined order, enabling more flexible and context-aware interactions.

• Conversational Chess: AutoGen builds interactive games with natural language interfaces, showcasing its potential for entertainment and creative applications. Overall, this podcast episode positions AutoGen as a powerful tool for building diverse and efficient LLM applications. It highlights AutoGen's ability to streamline development, improve performance, and enable novel applications by leveraging the power of multi-agent conversation and flexible programming paradigms.

https://arxiv.org/pdf/2308.08155

Project Archetypes for Cognitive Computing Projects

10 november 2024 | 16 min

This episode explores the challenges and evolving paradigms in AI application development, drawing from a research paper on project archetypes for AI development1. The episode examines how existing project management frameworks fall short in addressing the unique uncertainties of AI projects, leading to the emergence of a new archetype – the cognitive computing project.

Traditional Archetypes vs. the Reality of AI Development

The episode highlights four traditional project archetypes often applied to AI development, each with its own set of assumptions and limitations.

Agile Software Development: While appealing for its iterative and client-focused approach, agile methodologies struggle with the unpredictable nature of AI development, where outcomes heavily depend on data quality and model training.

Integration, Customization, Implementation: Viewing AI development as simply adapting an existing platform underestimates the complexities of data-driven AI, which requires extensive data processing and model training.

Design Thinking Project:

Though design thinking's focus on problem identification and creative solutions is valuable, AI projects often face constraints due to data availability and technical feasibility, limiting the open-ended exploration typically associated with design thinking.

Big Data Analytics:

While emphasizing data analysis is crucial, the goal of AI projects extends beyond generating insights; they aim to build functional applications, requiring skills beyond data science, such as business understanding and user interface development.

The Rise of the Cognitive Computing Project

The episode introduces the cognitive computing project as a new archetype better suited for AI development.

Key characteristics include:

• Focus on collaborative exploration: Acknowledging the iterative and unpredictable nature of AI, the project emphasizes joint efforts between the client and vendor to understand data potentials and align them with the platform's capabilities.

• Data-centric approach: Recognizing the critical role of data, the project prioritizes data understanding, preparation, and iterative model training.

• The need for a Data Consultant: Bridging the gap between business needs and data science expertise, this role ensures alignment between data insights and business goals.

Challenges and Opportunities for the Future

The episode discusses the limitations of the cognitive computing archetype, such as the need for better guidance on transitioning from exploration to exploitation, addressing knowledge gaps between business users and data scientists, and defining effective collaboration strategies. The episode concludes by emphasizing the importance of:

• Further research on AI development methodologies: This includes understanding the balance between exploration and exploitation, developing effective collaboration techniques, and defining the data consultant role more comprehensively.

• Training and education: Equipping business professionals with a basic understanding of AI and data science, while also educating data scientists on practical application challenges, will be crucial for successful AI development. This episode offers valuable insights for anyone involved in AI development, highlighting the need for new approaches and collaborative strategies to navigate the complexities of this rapidly evolving field.

https://arxiv.org/pdf/2408.04317

ArguMentor: The Value of Counter-Perspectives

9 november 2024 | 13 min

Thought of Search

8 november 2024 | 10 min

This episode examines a recent research paper that explores how Large Language Models (LLMs) can be used for planning in problem-solving scenarios, with a focus on balancing computational efficiency with the accuracy of the generated plans.

• The traditional approach to planning involves searching through a problem's state space using algorithms like Breadth-First Search (BFS) or Depth-First Search (DFS).

• Recent trends in planning with LLMs often involve calling the LLM at each step of the search process, which can be computationally expensive and environmentally detrimental.

• These LLM-based methods are typically neither sound nor complete. This means they may generate invalid solutions or fail to find a solution even if one exists.

• Furthermore, simply abandoning soundness and completeness for LLM-based planning methods does not necessarily improve their efficiency.

• The research paper proposes a new approach that utilizes LLMs to generate the code for crucial search components, like the successor function and the goal test.

• This approach is demonstrated on four classic search problems: the 24 Game, mini crosswords, BlocksWorld, and PrOntoQA (a logical reasoning dataset).

• In these experiments, the researchers used the GPT-4 model in chat mode to generate Python code for the search components.

• The generated code was then incorporated into standard BFS or DFS algorithms to solve the problems.

• This method achieved 100% accuracy on all four datasets while requiring significantly fewer calls to the LLM compared to other methods.

• The researchers argue that this approach offers a more responsible use of computational resources and promotes the development of sound and complete LLM-based planning methods that prioritize efficiency.

The episode also features a discussion of the limitations of current LLM-based planning methods and explores future directions for research in this area. The researchers suggest investigating the use of LLMs for generating code for:

• Search guidance techniques

• Search pruning techniques

• Methods to relax the need for human feedback when creating implementations of search components.

Overall, this podcast episode provides listeners with a deeper understanding of the challenges and opportunities associated with using LLMs for planning and highlights a novel approach that balances the need for accuracy and efficiency in AI-powered problem-solving.

https://arxiv.org/pdf/2404.11833

LLM-Based Agents for Software Engineering: A Survey

7 november 2024 | 11 min

This episode explores the fascinating world of LLM-based agents and their growing impact on software engineering. Forget standalone LLMs, these intelligent agents are supercharged with abilities to interact with external tools and resources, making them powerful allies for developers.

We'll break down the core components of these agents - planning, memory, perception, and action - and see how they work together to tackle real-world software engineering challenges. From automating code generation and bug detection to streamlining the entire development process, we'll uncover how LLM-based agents are revolutionizing the way software is built and maintained.

We'll also examine the exciting possibilities and challenges of human-agent collaboration, exploring how developers can work hand-in-hand with these AI-powered assistants. Tune in to learn about the cutting edge of AI in software engineering and get a glimpse into the future of software development!

Key Discussion Points:

• Types of LLM-based agents for different SE tasks: requirements engineering, code generation, code review, testing, debugging, end-to-end software development and maintenance

• The survey methodology behind the research: DBLP database search, keyword selection, snowballing approach, and paper statistics

• The architecture of LLM-based agents: planning strategies (single-turn vs. multi-turn, plan representation), memory (short-term vs. long-term, ownership, format, operations), perception (textual vs. visual input), action (tool usage and API invocation)

• Multi-agent systems and their roles in simulating real-world software teams: managers, requirement analysts, designers, developers, quality assurance experts, etc.

• Collaboration mechanisms within multi-agent systems: ordered vs. unordered modes, communication protocols (natural language vs. structured)

• Benchmarks and metrics for evaluating LLM-based agents for end-to-end software development: including existing code generation benchmarks and newly created benchmarks that simulate real-world projects

• Human-agent collaboration in various software development phases: planning, requirements, development, and evaluation

• Future research opportunities and open challenges in the field

https://arxiv.org/pdf/2409.02977

Reasoning via Planning (RAP)

6 november 2024 | 9 min

Agentic Horizons

Agentic Horizons is an AI-hosted podcast exploring the cutting edge of artificial intelligence.

Om podden

Avsnitt