LLM Agents

Playing with Strangers: A New Benchmark for Ad-Hoc Human-AI Teamwork

Human-AI collaboration is easy to romanticize in theory but hard to operationalize in practice. While reinforcement learning agents have dazzled us in games like Go and StarCraft, they often stumble when asked to cooperate with humans under real-world constraints: imperfect information, ambiguous signals, and no chance to train together beforehand. That’s the realm of ad-hoc teamwork—and the latest paper from Oxford’s FLAIR lab introduces a critical step forward. The Ad-Hoc Human-AI Coordination Challenge (AH2AC2) tackles this problem by leveraging Hanabi, a cooperative card game infamous among AI researchers for its subtle, communication-constrained dynamics. Unlike chess, Hanabi demands theory of mind—inferring what your teammate knows and intends based on sparse, indirect cues. It’s a Turing Test of collaboration. ...

The Joy of Many Minds: How JoyAgents-R1 Unleashes the Power of Multi-LLM Reinforcement Learning

When it comes to language model agents, more minds may not always mean merrier results. Multi-agent reinforcement learning (MARL) promises a flexible path for decomposing and solving complex tasks, but coordinating multiple large language models (LLMs) remains riddled with instability, inefficiency, and memory fragmentation. Enter JoyAgents-R1, a novel framework that proposes an elegant, scalable solution for jointly evolving heterogeneous LLM agents using Group Relative Policy Optimization (GRPO). Developed by researchers at JD.com, JoyAgents-R1 combines memory evolution, policy optimization, and clever sampling strategies to form a resilient multi-agent architecture capable of matching the performance of larger SOTA models with far fewer parameters. ...

Innovation, Agentified: How TRIZ Got Its AI Makeover

In the symphony of innovation, TRIZ has long served as the structured score guiding engineers toward inventive breakthroughs. But what happens when you give the orchestra to a team of AI agents? Enter TRIZ Agents, a bold exploration of how large language model (LLM) agents—armed with tools, prompts, and persona-based roles—can orchestrate a complete innovation cycle using the TRIZ methodology. Cracking the Code of Creativity TRIZ (Theory of Inventive Problem Solving), derived from the study of thousands of patents, offers a time-tested approach to resolving contradictions in engineering design. It formalizes the innovation process through tools like the 40 Inventive Principles and the Contradiction Matrix. However, its structured elegance demands deep domain expertise—something often scarce outside elite R&D centers. ...

The Memory Advantage: When AI Agents Learn from the Past

What if your AI agent could remember the last time it made a mistake—and plan better this time? From Reaction to Reflection: Why Memory Matters Most language model agents today operate like goldfish—brilliant at reasoning in the moment, but forgetful. Whether navigating virtual environments, answering complex questions, or composing multi-step strategies, they often repeat past mistakes simply because they lack a memory of past episodes. That’s where the paper “Agentic Episodic Control” by Zhihan Xiong et al. introduces a critical upgrade to today’s LLM agents: a modular episodic memory system inspired by human cognition. Instead of treating each prompt as a blank slate, this framework allows agents to recall, adapt, and refine prior reasoning paths—without retraining the underlying model. ...

Mind the Context: How ContextAgent Listens, Sees, and Acts Before You Ask

Introduction: From Reaction to Proaction Imagine an assistant that doesn’t wait for your command. It notices you’re standing by a bus stop late at night and proactively checks the next bus arrival. If it’s too far off, it suggests calling a ride instead. Welcome to the world of ContextAgent — a proactive, context-aware Large Language Model (LLM) agent designed to act before you’re forced to ask. While most LLM agents still require explicit prompts and work in tightly scoped environments like desktops, ContextAgent leverages open-world sensory inputs (from devices like smart glasses, earphones, and smartphones) to understand user context and offer unobtrusive help. ...

When Smart AI Gets It Wrong: Diagnosing the Knowing-Doing Gap in Language Model Agents

“You expect AI to be dumber than humans. But when it’s smarter and still fails, that’s when it hurts.” Earlier this month, Cursor AI’s chatbot “Sam” fabricated a nonexistent refund policy, confidently explaining to users why it was entitled to keep their subscription money—even when those users were eligible for a refund1. The backlash was immediate. Users lost trust. Some cancelled their subscriptions entirely. ...