<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>LLM Agents on Cognaptus</title>
    <link>https://cognaptus.com/tags/llm-agents/</link>
    <description>Recent content in LLM Agents on Cognaptus</description>
    <generator>Hugo -- 0.145.0</generator>
    <language>en-us</language>
    <lastBuildDate>Mon, 08 Jun 2026 00:00:00 +0000</lastBuildDate>
    <atom:link href="https://cognaptus.com/tags/llm-agents/index.xml" rel="self" type="application/rss+xml" />
    <item>
      <title>Search, Critique, Repeat: Critic-R Turns RAG Complaints into Retriever Training</title>
      <link>https://cognaptus.com/blog/2026-06-08-search-critique-repeat-criticr-turns-rag-complaints-into-retriever-training/</link>
      <pubDate>Mon, 08 Jun 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-06-08-search-critique-repeat-criticr-turns-rag-complaints-into-retriever-training/</guid>
      <description>A mechanism-first reading of Critic-R, a framework that uses agent introspection to repair retrieval at inference time and train better retrievers without gold passage labels.</description>
    </item>
    <item>
      <title>Scaffold and Ladder: Why AI Agents Need Meta-Reasoning, Not Longer Monologues</title>
      <link>https://cognaptus.com/blog/2026-06-01-scaffold-and-ladder-why-ai-agents-need-metareasoning-not-longer-monologues/</link>
      <pubDate>Mon, 01 Jun 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-06-01-scaffold-and-ladder-why-ai-agents-need-metareasoning-not-longer-monologues/</guid>
      <description>A mechanism-first reading of Deep Reasoning and Dolores, showing why agent reliability may depend less on longer thinking and more on executable task-specific decomposition.</description>
    </item>
    <item>
      <title>Think Longer, Act Smarter: Why Coding Agents Need Behavior-Preserving Reasoning</title>
      <link>https://cognaptus.com/blog/2026-05-31-think-longer-act-smarter-why-coding-agents-need-behaviorpreserving-reasoning/</link>
      <pubDate>Sun, 31 May 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-05-31-think-longer-act-smarter-why-coding-agents-need-behaviorpreserving-reasoning/</guid>
      <description>A mechanism-first reading of M2A, a training-free method for injecting mathematical reasoning into coding agents without breaking their think-act-observe loop.</description>
    </item>
    <item>
      <title>Don’t Average the Needle: Spectral Retrieval and the RAG Evidence Problem</title>
      <link>https://cognaptus.com/blog/2026-05-30-dont-average-the-needle-spectral-retrieval-and-the-rag-evidence-problem/</link>
      <pubDate>Sat, 30 May 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-05-30-dont-average-the-needle-spectral-retrieval-and-the-rag-evidence-problem/</guid>
      <description>A mechanism-first reading of Spectral Retrieval: why dense retrieval can bury localized evidence, how multi-scale sinc convolution tries to recover it, and where the business value actually begins.</description>
    </item>
    <item>
      <title>Experience Is Not Memory: Why Learning Agents Need a Better Feedback Loop</title>
      <link>https://cognaptus.com/blog/2026-05-29-experience-is-not-memory-why-learning-agents-need-a-better-feedback-loop/</link>
      <pubDate>Fri, 29 May 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-05-29-experience-is-not-memory-why-learning-agents-need-a-better-feedback-loop/</guid>
      <description>A mechanism-first reading of In-context Training, a new framework for testing whether language agents can turn one-off experience into reusable operational improvement.</description>
    </item>
    <item>
      <title>Think Longer, Act Worse? What M2A Teaches About Reasoning Agents</title>
      <link>https://cognaptus.com/blog/2026-05-29-think-longer-act-worse-what-m2a-teaches-about-reasoning-agents/</link>
      <pubDate>Fri, 29 May 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-05-29-think-longer-act-worse-what-m2a-teaches-about-reasoning-agents/</guid>
      <description>A mechanism-first reading of M2A, showing why better reasoning agents need protected action loops, not just longer thought traces.</description>
    </item>
    <item>
      <title>Silent Errors, Loud Consequences: ASMR-Bench and the Coming Era of AI Auditors</title>
      <link>https://cognaptus.com/blog/2026-04-22-silent-errors-loud-consequences-asmrbench-and-the-coming-era-of-ai-auditors/</link>
      <pubDate>Wed, 22 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-22-silent-errors-loud-consequences-asmrbench-and-the-coming-era-of-ai-auditors/</guid>
      <description>A research-sabotage benchmark shows why AI auditability is not a code-review feature, but an operating model for trustworthy AI work.</description>
    </item>
    <item>
      <title>Reviewer, Reviewed: When AI Starts Grading the Graders</title>
      <link>https://cognaptus.com/blog/2026-04-16-reviewer-reviewed-when-ai-starts-grading-the-graders/</link>
      <pubDate>Thu, 16 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-16-reviewer-reviewed-when-ai-starts-grading-the-graders/</guid>
      <description>A field deployment of AI-generated peer review at AAAI-26 shows where AI can outperform human reviewers, where it still fails, and what businesses should learn about governed second-opinion systems.</description>
    </item>
    <item>
      <title>Evolve or Die Trying: When LLMs Stop Writing Code and Start Designing Algorithms</title>
      <link>https://cognaptus.com/blog/2026-04-15-evolve-or-die-trying-when-llms-stop-writing-code-and-start-designing-algorithms/</link>
      <pubDate>Wed, 15 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-15-evolve-or-die-trying-when-llms-stop-writing-code-and-start-designing-algorithms/</guid>
      <description>BEAM shows that useful LLM algorithm design is less about clever prompting and more about structured search, reusable memory, and evaluation that actually resembles solver construction.</description>
    </item>
    <item>
      <title>The Memory Isn’t Broken — It’s Flat: Why LLMs Need to ‘Draw’ to Remember</title>
      <link>https://cognaptus.com/blog/2026-04-15-the-memory-isnt-broken-its-flat-why-llms-need-to-draw-to-remember/</link>
      <pubDate>Wed, 15 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-15-the-memory-isnt-broken-its-flat-why-llms-need-to-draw-to-remember/</guid>
      <description>A mechanism-first reading of dual-trace memory encoding and why enterprise AI agents may need richer contextual traces, not just larger memory stores.</description>
    </item>
    <item>
      <title>CivBench: When AI Stops Guessing and Starts Planning</title>
      <link>https://cognaptus.com/blog/2026-04-11-civbench-when-ai-stops-guessing-and-starts-planning/</link>
      <pubDate>Sat, 11 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-11-civbench-when-ai-stops-guessing-and-starts-planning/</guid>
      <description>CivBench shows why serious agent evaluation needs progress signals, not just final scoreboards.</description>
    </item>
    <item>
      <title>Feeling the Model: When LLMs Don’t Just Predict — They ‘Feel’</title>
      <link>https://cognaptus.com/blog/2026-04-11-feeling-the-model-when-llms-dont-just-predict-they-feel/</link>
      <pubDate>Sat, 11 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-11-feeling-the-model-when-llms-dont-just-predict-they-feel/</guid>
      <description>Anthropic’s emotion-vector study shows why enterprise AI risk is not only about bad prompts or bad outputs, but about hidden internal states that can steer agents toward shortcuts, sycophancy, and coercive behavior.</description>
    </item>
    <item>
      <title>Mind the Cut: Where Your AI Strategy Quietly Breaks</title>
      <link>https://cognaptus.com/blog/2026-04-11-mind-the-cut-where-your-ai-strategy-quietly-breaks/</link>
      <pubDate>Sat, 11 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-11-mind-the-cut-where-your-ai-strategy-quietly-breaks/</guid>
      <description>A business-oriented reading of the Cartesian cut: why the boundary between model and runtime determines whether AI agents remain governable, brittle, or truly autonomous.</description>
    </item>
    <item>
      <title>The Orchestrator Problem: When AI Meets Exascale Reality</title>
      <link>https://cognaptus.com/blog/2026-04-11-the-orchestrator-problem-when-ai-meets-exascale-reality/</link>
      <pubDate>Sat, 11 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-11-the-orchestrator-problem-when-ai-meets-exascale-reality/</guid>
      <description>A mechanism-first reading of how LLM agents become useful for scientific computing only when they stop pretending to be schedulers.</description>
    </item>
    <item>
      <title>The Persuasion Engine: When AI Starts Selling (More Than Just Answers)</title>
      <link>https://cognaptus.com/blog/2026-04-10-the-persuasion-engine-when-ai-starts-selling-more-than-just-answers/</link>
      <pubDate>Fri, 10 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-10-the-persuasion-engine-when-ai-starts-selling-more-than-just-answers/</guid>
      <description>A mechanism-first reading of how sponsored incentives can distort AI assistants before they ever need to lie.</description>
    </item>
    <item>
      <title>From Chains to Trees: Why LLM Agents Need Structural Memory</title>
      <link>https://cognaptus.com/blog/2026-04-09-from-chains-to-trees-why-llm-agents-need-structural-memory/</link>
      <pubDate>Thu, 09 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-09-from-chains-to-trees-why-llm-agents-need-structural-memory/</guid>
      <description>A mechanism-first reading of T-STAR, showing why multi-turn LLM agents learn better when failed and successful rollouts are compared as shared trees rather than isolated chains.</description>
    </item>
    <item>
      <title>The Map Is Not the Territory—But Your LLM Thinks It Is</title>
      <link>https://cognaptus.com/blog/2026-04-09-the-map-is-not-the-territorybut-your-llm-thinks-it-is/</link>
      <pubDate>Thu, 09 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-09-the-map-is-not-the-territorybut-your-llm-thinks-it-is/</guid>
      <description>EVGeoQA shows why tool-using LLM agents still struggle with real-world spatial planning: they can reason locally, but often fail to explore enough.</description>
    </item>
    <item>
      <title>The Minimal LLM Thesis: When Agents Think for Themselves</title>
      <link>https://cognaptus.com/blog/2026-04-09-the-minimal-llm-thesis-when-agents-think-for-themselves/</link>
      <pubDate>Thu, 09 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-09-the-minimal-llm-thesis-when-agents-think-for-themselves/</guid>
      <description>A decomposition study shows why agent performance may come from measurable harness structure before it comes from larger or more frequent LLM calls.</description>
    </item>
    <item>
      <title>Benchmarking the Benchmarks: Why ACE-Bench Might Be the Missing Layer in Agent Evaluation</title>
      <link>https://cognaptus.com/blog/2026-04-08-benchmarking-the-benchmarks-why-acebench-might-be-the-missing-layer-in-agent-evaluation/</link>
      <pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-08-benchmarking-the-benchmarks-why-acebench-might-be-the-missing-layer-in-agent-evaluation/</guid>
      <description>A mechanism-first reading of AgentCE-Bench, showing why controllable agent evaluation may be more useful than another realism-heavy leaderboard.</description>
    </item>
    <item>
      <title>From Spreadsheets to Swarms: How Agentic AI Rewrites the Retail Supply Chain</title>
      <link>https://cognaptus.com/blog/2026-04-08-from-spreadsheets-to-swarms-how-agentic-ai-rewrites-the-retail-supply-chain/</link>
      <pubDate>Wed, 08 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-08-from-spreadsheets-to-swarms-how-agentic-ai-rewrites-the-retail-supply-chain/</guid>
      <description>A mechanism-first reading of Flowr, an agentic AI framework that turns supermarket replenishment from manual coordination into supervised workflow automation.</description>
    </item>
    <item>
      <title>Walking the Graph: When LLMs Stop Guessing and Start Navigating</title>
      <link>https://cognaptus.com/blog/2026-04-05-walking-the-graph-when-llms-stop-guessing-and-start-navigating/</link>
      <pubDate>Sun, 05 Apr 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-04-05-walking-the-graph-when-llms-stop-guessing-and-start-navigating/</guid>
      <description>GraphWalk shows why enterprise knowledge-graph reasoning needs auditable navigation tools, not just larger prompts or cleaner retrieval.</description>
    </item>
    <item>
      <title>The Model That Forgot Itself: Why LLMs Drift Without Knowing</title>
      <link>https://cognaptus.com/blog/2026-03-29-the-model-that-forgot-itself-why-llms-drift-without-knowing/</link>
      <pubDate>Sun, 29 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-03-29-the-model-that-forgot-itself-why-llms-drift-without-knowing/</guid>
      <description>A mechanism-first reading of why LLMs can appear consistent while silently changing their hidden goals across a conversation.</description>
    </item>
    <item>
      <title>Belief Is a Graph: Why LLM Agents Need Structured Minds</title>
      <link>https://cognaptus.com/blog/2026-03-23-belief-is-a-graph-why-llm-agents-need-structured-minds/</link>
      <pubDate>Mon, 23 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-03-23-belief-is-a-graph-why-llm-agents-need-structured-minds/</guid>
      <description>A mechanism-first reading of dynamic belief graphs, and why enterprise LLM agents need structured, auditable mental states rather than longer prompts.</description>
    </item>
    <item>
      <title>The Illusion of Anonymity: When AI Connects the Dots You Thought Were Safe</title>
      <link>https://cognaptus.com/blog/2026-03-21-the-illusion-of-anonymity-when-ai-connects-the-dots-you-thought-were-safe/</link>
      <pubDate>Sat, 21 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-03-21-the-illusion-of-anonymity-when-ai-connects-the-dots-you-thought-were-safe/</guid>
      <description>A mechanism-first reading of how LLM agents turn weak, anonymized cues into real identity hypotheses—and why enterprise privacy governance must move beyond PII masking.</description>
    </item>
    <item>
      <title>The Hidden Playbook of LLMs: How AI Quietly Thinks Like a Hacker</title>
      <link>https://cognaptus.com/blog/2026-03-20-the-hidden-playbook-of-llms-how-ai-quietly-thinks-like-a-hacker/</link>
      <pubDate>Fri, 20 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-03-20-the-hidden-playbook-of-llms-how-ai-quietly-thinks-like-a-hacker/</guid>
      <description>A mechanism-first reading of how LLM agents implicitly control long-horizon binary vulnerability analysis through pruning, lock-in, backtracking, and prioritization.</description>
    </item>
    <item>
      <title>When Alignment Meets Reality: Why LLMs Can’t Agree With Themselves</title>
      <link>https://cognaptus.com/blog/2026-03-17-when-alignment-meets-reality-why-llms-cant-agree-with-themselves/</link>
      <pubDate>Tue, 17 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-03-17-when-alignment-meets-reality-why-llms-cant-agree-with-themselves/</guid>
      <description>A mechanism-first reading of why LLM alignment conflicts emerge, how priority hacking exploits them, and what enterprise AI systems should do at runtime.</description>
    </item>
    <item>
      <title>Ants in the Machine: What Swarm Intelligence Teaches Us About Routing LLM Agents</title>
      <link>https://cognaptus.com/blog/2026-03-16-ants-in-the-machine-what-swarm-intelligence-teaches-us-about-routing-llm-agents/</link>
      <pubDate>Mon, 16 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-03-16-ants-in-the-machine-what-swarm-intelligence-teaches-us-about-routing-llm-agents/</guid>
      <description>A mechanism-first reading of AMRO-S, a semantic and ant-colony-inspired routing framework for making multi-agent LLM systems cheaper, faster, and easier to inspect.</description>
    </item>
    <item>
      <title>From Hallucination to Verification: Why AI Needs a Pharmacist’s Mindset</title>
      <link>https://cognaptus.com/blog/2026-03-13-from-hallucination-to-verification-why-ai-needs-a-pharmacists-mindset/</link>
      <pubDate>Fri, 13 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-03-13-from-hallucination-to-verification-why-ai-needs-a-pharmacists-mindset/</guid>
      <description>A prescription-auditing paper shows why safe AI needs hybrid knowledge stores, deterministic checks, and evidence-grounded reasoning—not just bigger models.</description>
    </item>
    <item>
      <title>Prompt Politics: How Tiny Policies Can Steer Entire AI Societies</title>
      <link>https://cognaptus.com/blog/2026-03-11-prompt-politics-how-tiny-policies-can-steer-entire-ai-societies/</link>
      <pubDate>Wed, 11 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-03-11-prompt-politics-how-tiny-policies-can-steer-entire-ai-societies/</guid>
      <description>A mechanism-first reading of how policy-parameterized prompts can steer LLM multi-agent dialogue without model training—and what that means for business agent systems.</description>
    </item>
    <item>
      <title>When Plans Talk Back: Conversational AI Meets Classical Planning</title>
      <link>https://cognaptus.com/blog/2026-03-03-when-plans-talk-back-conversational-ai-meets-classical-planning/</link>
      <pubDate>Tue, 03 Mar 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-03-03-when-plans-talk-back-conversational-ai-meets-classical-planning/</guid>
      <description>A mechanism-first reading of how LLM agents can make formal planning systems easier to question, revise, and trust without pretending to replace the planner.</description>
    </item>
    <item>
      <title>When Agents Ask for Help: Teaching LLMs the Art of Expert Collaboration</title>
      <link>https://cognaptus.com/blog/2026-02-28-when-agents-ask-for-help-teaching-llms-the-art-of-expert-collaboration/</link>
      <pubDate>Sat, 28 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-28-when-agents-ask-for-help-teaching-llms-the-art-of-expert-collaboration/</guid>
      <description>A mechanism-first reading of AHCE, a framework that teaches LLM agents when to escalate to human experts and how to turn messy advice into executable action.</description>
    </item>
    <item>
      <title>Gamma Rays and Toolboxes: Why Superintelligence May Be a Systems Engineering Problem</title>
      <link>https://cognaptus.com/blog/2026-02-25-gamma-rays-and-toolboxes-why-superintelligence-may-be-a-systems-engineering-problem/</link>
      <pubDate>Wed, 25 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-25-gamma-rays-and-toolboxes-why-superintelligence-may-be-a-systems-engineering-problem/</guid>
      <description>A new benchmark suggests that long-horizon AI reasoning may depend less on raw model scale than on whether models can reliably combine state, evidence, validation, and tools.</description>
    </item>
    <item>
      <title>Agents in Lab Coats: When LLMs Try to Become Data Scientists</title>
      <link>https://cognaptus.com/blog/2026-02-22-agents-in-lab-coats-when-llms-try-to-become-data-scientists/</link>
      <pubDate>Sun, 22 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-22-agents-in-lab-coats-when-llms-try-to-become-data-scientists/</guid>
      <description>A comparison-based guide to when single-agent, two-agent, multi-agent, and dynamic LLM data-science systems actually make business sense.</description>
    </item>
    <item>
      <title>Don’t Prompt Harder — Engineer Smarter: Inside CEDAR’s Agentic Data Scientist</title>
      <link>https://cognaptus.com/blog/2026-02-22-dont-prompt-harder-engineer-smarter-inside-cedars-agentic-data-scientist/</link>
      <pubDate>Sun, 22 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-22-dont-prompt-harder-engineer-smarter-inside-cedars-agentic-data-scientist/</guid>
      <description>CEDAR shows why useful AI data science systems depend less on magical prompting and more on structured context, local execution, agent routing, and inspectable workflows.</description>
    </item>
    <item>
      <title>From SQL Copilot to Autonomous Data Scientist: The L0–L5 Reality Check</title>
      <link>https://cognaptus.com/blog/2026-02-22-from-sql-copilot-to-autonomous-data-scientist-the-l0l5-reality-check/</link>
      <pubDate>Sun, 22 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-22-from-sql-copilot-to-autonomous-data-scientist-the-l0l5-reality-check/</guid>
      <description>A practical autonomy map for separating ordinary data copilots from supervised workflow agents, proactive data operators, and still-speculative autonomous data scientists.</description>
    </item>
    <item>
      <title>Death by a Thousand Prompts: Why Long-Horizon Attacks Break AI Agents</title>
      <link>https://cognaptus.com/blog/2026-02-21-death-by-a-thousand-prompts-why-longhorizon-attacks-break-ai-agents/</link>
      <pubDate>Sat, 21 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-21-death-by-a-thousand-prompts-why-longhorizon-attacks-break-ai-agents/</guid>
      <description>AgentLAB shows why enterprise AI security must move from single-prompt filtering to trajectory-level control over tools, memory, and multi-step behavior.</description>
    </item>
    <item>
      <title>From PDE to Pipeline: When LLMs Become Numerical Architects</title>
      <link>https://cognaptus.com/blog/2026-02-20-from-pde-to-pipeline-when-llms-become-numerical-architects/</link>
      <pubDate>Fri, 20 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-20-from-pde-to-pipeline-when-llms-become-numerical-architects/</guid>
      <description>A mechanism-first reading of AutoNumerics, showing why automated PDE solving is less about code generation and more about controlled solver planning, debugging, and verification.</description>
    </item>
    <item>
      <title>Consistency Is Not a Coincidence: When LLM Agents Disagree With Themselves</title>
      <link>https://cognaptus.com/blog/2026-02-14-consistency-is-not-a-coincidence-when-llm-agents-disagree-with-themselves/</link>
      <pubDate>Sat, 14 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-14-consistency-is-not-a-coincidence-when-llm-agents-disagree-with-themselves/</guid>
      <description>A paper on behavioral consistency shows why repeated agent trajectories can become an early warning signal for enterprise AI reliability.</description>
    </item>
    <item>
      <title>Mind the Gap: When Clinical LLMs Learn from Their Own Mistakes</title>
      <link>https://cognaptus.com/blog/2026-02-11-mind-the-gap-when-clinical-llms-learn-from-their-own-mistakes/</link>
      <pubDate>Wed, 11 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-11-mind-the-gap-when-clinical-llms-learn-from-their-own-mistakes/</guid>
      <description>A close reading of Differential Reasoning Learning, a clinical-agent framework that turns reasoning failures into reusable, auditable correction patches.</description>
    </item>
    <item>
      <title>From Features to Actions: Why Agentic AI Needs a New Explainability Playbook</title>
      <link>https://cognaptus.com/blog/2026-02-09-from-features-to-actions-why-agentic-ai-needs-a-new-explainability-playbook/</link>
      <pubDate>Mon, 09 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-09-from-features-to-actions-why-agentic-ai-needs-a-new-explainability-playbook/</guid>
      <description>A practical reading of why feature attribution explains static predictions, but trajectory-level diagnostics are needed to understand failures in agentic AI systems.</description>
    </item>
    <item>
      <title>When Aligned Models Compete: Nash Equilibria as the New Alignment Layer</title>
      <link>https://cognaptus.com/blog/2026-02-09-when-aligned-models-compete-nash-equilibria-as-the-new-alignment-layer/</link>
      <pubDate>Mon, 09 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-09-when-aligned-models-compete-nash-equilibria-as-the-new-alignment-layer/</guid>
      <description>A mechanism-first reading of LLM active alignment: why individually aligned agents can still produce exclusionary system equilibria when they compete for attention.</description>
    </item>
    <item>
      <title>Learning to Inject: When Prompt Injection Becomes an Optimization Problem</title>
      <link>https://cognaptus.com/blog/2026-02-08-learning-to-inject-when-prompt-injection-becomes-an-optimization-problem/</link>
      <pubDate>Sun, 08 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-08-learning-to-inject-when-prompt-injection-becomes-an-optimization-problem/</guid>
      <description>AutoInject shows why prompt injection should be tested as an adaptive optimization problem, not merely as a list of hand-written attack templates.</description>
    </item>
    <item>
      <title>Stop the All-Hands Meeting: When AI Agents Learn Who Actually Needs to Talk</title>
      <link>https://cognaptus.com/blog/2026-02-06-stop-the-allhands-meeting-when-ai-agents-learn-who-actually-needs-to-talk/</link>
      <pubDate>Fri, 06 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-06-stop-the-allhands-meeting-when-ai-agents-learn-who-actually-needs-to-talk/</guid>
      <description>DyTopo shows why multi-agent AI systems should route information by need, not by habit.</description>
    </item>
    <item>
      <title>More Isn’t Smarter: Why Agent Diversity Beats Agent Count</title>
      <link>https://cognaptus.com/blog/2026-02-04-more-isnt-smarter-why-agent-diversity-beats-agent-count/</link>
      <pubDate>Wed, 04 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-04-more-isnt-smarter-why-agent-diversity-beats-agent-count/</guid>
      <description>A mechanism-first reading of why multi-agent LLM systems saturate when agents repeat each other, and why useful diversity beats raw agent count.</description>
    </item>
    <item>
      <title>When Agents Stop Talking to the Wrong People</title>
      <link>https://cognaptus.com/blog/2026-02-04-when-agents-stop-talking-to-the-wrong-people/</link>
      <pubDate>Wed, 04 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-04-when-agents-stop-talking-to-the-wrong-people/</guid>
      <description>TodyComm shows why multi-agent AI systems need learned communication governance, not just more agents talking more often.</description>
    </item>
    <item>
      <title>Coaching the Swarm: Why Multi‑Agent RL Finally Scales</title>
      <link>https://cognaptus.com/blog/2026-02-03-coaching-the-swarm-why-multiagent-rl-finally-scales/</link>
      <pubDate>Tue, 03 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-03-coaching-the-swarm-why-multiagent-rl-finally-scales/</guid>
      <description>A mechanism-first reading of MAPPA, a process-reward method for turning multiagent LLM workflows from prompted collaboration into trainable systems.</description>
    </item>
    <item>
      <title>Agentic Systems Need Architecture, Not Vibes</title>
      <link>https://cognaptus.com/blog/2026-02-02-agentic-systems-need-architecture-not-vibes/</link>
      <pubDate>Mon, 02 Feb 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-02-02-agentic-systems-need-architecture-not-vibes/</guid>
      <description>A mechanism-first reading of why reliable AI agents need subsystem architecture, reusable design patterns, and clearer diagnosis than another enthusiastic list of agent tricks.</description>
    </item>
    <item>
      <title>When LLMs Get a Laptop: Why Sandboxes Might Be the Real AGI Benchmark</title>
      <link>https://cognaptus.com/blog/2026-01-24-when-llms-get-a-laptop-why-sandboxes-might-be-the-real-agi-benchmark/</link>
      <pubDate>Sat, 24 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-01-24-when-llms-get-a-laptop-why-sandboxes-might-be-the-real-agi-benchmark/</guid>
      <description>A mechanism-first reading of LLM-in-Sandbox, showing why giving models a minimal computer environment may matter more than adding another clever prompt.</description>
    </item>
    <item>
      <title>Affective Inertia: Teaching LLM Agents to Remember Who They Are</title>
      <link>https://cognaptus.com/blog/2026-01-23-affective-inertia-teaching-llm-agents-to-remember-who-they-are/</link>
      <pubDate>Fri, 23 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-01-23-affective-inertia-teaching-llm-agents-to-remember-who-they-are/</guid>
      <description>A mechanism-first reading of how explicit state dynamics can make LLM agents more temporally coherent, and why too much stability becomes its own failure mode.</description>
    </item>
    <item>
      <title>Rebuttal Agents, Not Rebuttal Text: Why ‘Verify‑Then‑Write’ Is the Only Scalable Future</title>
      <link>https://cognaptus.com/blog/2026-01-21-rebuttal-agents-not-rebuttal-text-why-verifythenwrite-is-the-only-scalable-future/</link>
      <pubDate>Wed, 21 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-01-21-rebuttal-agents-not-rebuttal-text-why-verifythenwrite-is-the-only-scalable-future/</guid>
      <description>How RebuttalAgent turns author responses from fluent text generation into auditable concern tracking, evidence construction, and strategic planning.</description>
    </item>
    <item>
      <title>Recommendations With Receipts: When LLMs Have to Prove They Behaved</title>
      <link>https://cognaptus.com/blog/2026-01-17-recommendations-with-receipts-when-llms-have-to-prove-they-behaved/</link>
      <pubDate>Sat, 17 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-01-17-recommendations-with-receipts-when-llms-have-to-prove-they-behaved/</guid>
      <description>A mechanism-first look at PCN-Rec, a proof-carrying architecture that turns LLM recommenders from trusted decision-makers into auditable proposers.</description>
    </item>
    <item>
      <title>Bubble Trouble: Why Top‑K Retrieval Keeps Letting LLMs Down</title>
      <link>https://cognaptus.com/blog/2026-01-16-bubble-trouble-why-topk-retrieval-keeps-letting-llms-down/</link>
      <pubDate>Fri, 16 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-01-16-bubble-trouble-why-topk-retrieval-keeps-letting-llms-down/</guid>
      <description>A practical reading of Context Bubble construction: why enterprise RAG needs constrained, auditable context assembly rather than larger top-k piles.</description>
    </item>
    <item>
      <title>Knowing Is Not Doing: When LLM Agents Pass the Task but Fail the World</title>
      <link>https://cognaptus.com/blog/2026-01-15-knowing-is-not-doing-when-llm-agents-pass-the-task-but-fail-the-world/</link>
      <pubDate>Thu, 15 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-01-15-knowing-is-not-doing-when-llm-agents-pass-the-task-but-fail-the-world/</guid>
      <description>Task2Quiz shows why agent evaluation needs to separate task completion from grounded environment understanding.</description>
    </item>
    <item>
      <title>Scaling the Sandbox: When LLM Agents Need Better Worlds</title>
      <link>https://cognaptus.com/blog/2026-01-14-scaling-the-sandbox-when-llm-agents-need-better-worlds/</link>
      <pubDate>Wed, 14 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-01-14-scaling-the-sandbox-when-llm-agents-need-better-worlds/</guid>
      <description>EnvScaler shows why useful LLM agents may need scalable executable worlds—not just more prompts, more tools, or larger models.</description>
    </item>
    <item>
      <title>STACKPLANNER: When Agents Learn to Forget</title>
      <link>https://cognaptus.com/blog/2026-01-12-stackplanner-when-agents-learn-to-forget/</link>
      <pubDate>Mon, 12 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-01-12-stackplanner-when-agents-learn-to-forget/</guid>
      <description>A mechanism-first reading of STACKPLANNER, showing why long-horizon agent systems may need memory control more than bigger context windows.</description>
    </item>
    <item>
      <title>Agents That Ship, Not Just Think: When LLM Self-Improvement Meets Release Engineering</title>
      <link>https://cognaptus.com/blog/2026-01-11-agents-that-ship-not-just-think-when-llm-selfimprovement-meets-release-engineering/</link>
      <pubDate>Sun, 11 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-01-11-agents-that-ship-not-just-think-when-llm-selfimprovement-meets-release-engineering/</guid>
      <description>AgentDevel shows why improving LLM agents may require release gates, traces, and regression control more than another round of self-reflection.</description>
    </item>
    <item>
      <title>ResMAS: When Multi‑Agent Systems Stop Falling Apart</title>
      <link>https://cognaptus.com/blog/2026-01-11-resmas-when-multiagent-systems-stop-falling-apart/</link>
      <pubDate>Sun, 11 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-01-11-resmas-when-multiagent-systems-stop-falling-apart/</guid>
      <description>A mechanism-first reading of ResMAS, showing why resilient LLM agent systems depend on communication topology and topology-aware prompts, not just more agents.</description>
    </item>
    <item>
      <title>From Tokens to Topology: Teaching LLMs to Think in Simulink</title>
      <link>https://cognaptus.com/blog/2026-01-09-from-tokens-to-topology-teaching-llms-to-think-in-simulink/</link>
      <pubDate>Fri, 09 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-01-09-from-tokens-to-topology-teaching-llms-to-think-in-simulink/</guid>
      <description>A mechanism-first reading of SimuAgent, a Simulink modeling assistant that shows why representation, validation, curriculum, and reflection matter more than merely attaching a larger model to an engineering tool.</description>
    </item>
    <item>
      <title>Trading Without Cheating: Teaching LLMs to Reason When Markets Lie</title>
      <link>https://cognaptus.com/blog/2026-01-08-trading-without-cheating-teaching-llms-to-reason-when-markets-lie/</link>
      <pubDate>Thu, 08 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-01-08-trading-without-cheating-teaching-llms-to-reason-when-markets-lie/</guid>
      <description>A mechanism-first reading of Trade-R1, a framework for training financial LLM agents when market returns are objective but dangerously noisy.</description>
    </item>
    <item>
      <title>Pulling the Thread: Why LLM Reasoning Often Unravels</title>
      <link>https://cognaptus.com/blog/2026-01-06-pulling-the-thread-why-llm-reasoning-often-unravels/</link>
      <pubDate>Tue, 06 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-01-06-pulling-the-thread-why-llm-reasoning-often-unravels/</guid>
      <description>Project Ariadne shows how counterfactual interventions can audit whether an LLM’s reasoning trace actually causes its answer, or merely decorates it.</description>
    </item>
    <item>
      <title>Talking to Yourself, but Make It Useful: Intrinsic Self‑Critique in LLM Planning</title>
      <link>https://cognaptus.com/blog/2026-01-03-talking-to-yourself-but-make-it-useful-intrinsic-selfcritique-in-llm-planning/</link>
      <pubDate>Sat, 03 Jan 2026 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2026-01-03-talking-to-yourself-but-make-it-useful-intrinsic-selfcritique-in-llm-planning/</guid>
      <description>A procedural self-critique loop can make LLM planners markedly more reliable—but only when reflection is converted into explicit rule checking, state tracking, and conservative approval.</description>
    </item>
    <item>
      <title>Silent Scholars, No More: When Uncertainty Becomes an Agent’s Survival Instinct</title>
      <link>https://cognaptus.com/blog/2025-12-28-silent-scholars-no-more-when-uncertainty-becomes-an-agents-survival-instinct/</link>
      <pubDate>Sun, 28 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-12-28-silent-scholars-no-more-when-uncertainty-becomes-an-agents-survival-instinct/</guid>
      <description>A mechanism-first reading of why future LLM agents may need uncertainty-driven feedback loops, not just larger memories or better retrieval.</description>
    </item>
    <item>
      <title>When Reflection Needs a Committee: Why LLMs Think Better in Groups</title>
      <link>https://cognaptus.com/blog/2025-12-28-when-reflection-needs-a-committee-why-llms-think-better-in-groups/</link>
      <pubDate>Sun, 28 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-12-28-when-reflection-needs-a-committee-why-llms-think-better-in-groups/</guid>
      <description>A mechanism-first reading of Multi-Agent Reflexion and what it teaches businesses about separating execution, critique, judgment, and memory in LLM agents.</description>
    </item>
    <item>
      <title>When Agents Agree Too Much: Emergent Bias in Multi‑Agent AI Systems</title>
      <link>https://cognaptus.com/blog/2025-12-21-when-agents-agree-too-much-emergent-bias-in-multiagent-ai-systems/</link>
      <pubDate>Sun, 21 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-12-21-when-agents-agree-too-much-emergent-bias-in-multiagent-ai-systems/</guid>
      <description>A financial AI fairness study shows why testing individual LLM agents is not enough when their collaboration can create new system-level bias.</description>
    </item>
    <item>
      <title>Don’t Tell the Robot What You Know</title>
      <link>https://cognaptus.com/blog/2025-12-20-dont-tell-the-robot-what-you-know/</link>
      <pubDate>Sat, 20 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-12-20-dont-tell-the-robot-what-you-know/</guid>
      <description>A new embodied-agent study shows why collaborative AI fails when the informed agent gives more instructions instead of helping the limited agent verify what it can actually perceive.</description>
    </item>
    <item>
      <title>Model First, Think Later: Why LLMs Fail Before They Reason</title>
      <link>https://cognaptus.com/blog/2025-12-17-model-first-think-later-why-llms-fail-before-they-reason/</link>
      <pubDate>Wed, 17 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-12-17-model-first-think-later-why-llms-fail-before-they-reason/</guid>
      <description>A practical reading of Model-First Reasoning: why agent failures often begin with unstable problem representation, not weak reasoning.</description>
    </item>
    <item>
      <title>When Rewards Learn Back: Evolution, but With Gradients</title>
      <link>https://cognaptus.com/blog/2025-12-16-when-rewards-learn-back-evolution-but-with-gradients/</link>
      <pubDate>Tue, 16 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-12-16-when-rewards-learn-back-evolution-but-with-gradients/</guid>
      <description>A mechanism-first reading of DERL: how reward design becomes a learnable outer-loop problem, and why that matters for enterprise agents.</description>
    </item>
    <item>
      <title>When Agents Loop: Geometry, Drift, and the Hidden Physics of LLM Behavior</title>
      <link>https://cognaptus.com/blog/2025-12-14-when-agents-loop-geometry-drift-and-the-hidden-physics-of-llm-behavior/</link>
      <pubDate>Sun, 14 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-12-14-when-agents-loop-geometry-drift-and-the-hidden-physics-of-llm-behavior/</guid>
      <description>A practical reading of how recursive LLM agents converge, drift, or wander depending less on the model than on the loop we force it to run.</description>
    </item>
    <item>
      <title>When Tokens Become Actions: A Policy Gradient Built for Transformers</title>
      <link>https://cognaptus.com/blog/2025-12-14-when-tokens-become-actions-a-policy-gradient-built-for-transformers/</link>
      <pubDate>Sun, 14 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-12-14-when-tokens-become-actions-a-policy-gradient-built-for-transformers/</guid>
      <description>A mechanism-first reading of GPG, a Transformer-aware policy-gradient framework that turns output segments into trainable macro-actions for LLM agents.</description>
    </item>
    <item>
      <title>Teach Me Once: How One‑Shot LLM Guidance Reshapes Hierarchical Planning</title>
      <link>https://cognaptus.com/blog/2025-12-11-teach-me-once-how-oneshot-llm-guidance-reshapes-hierarchical-planning/</link>
      <pubDate>Thu, 11 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-12-11-teach-me-once-how-oneshot-llm-guidance-reshapes-hierarchical-planning/</guid>
      <description>A mechanism-first reading of SCOPE, a paper showing how LLM guidance can be moved from runtime planning into one-time subgoal initialization for cheaper hierarchical agents.</description>
    </item>
    <item>
      <title>Error 404: Peer Review Not Found — How LLMs Are Quietly Rewriting Scientific Quality Control</title>
      <link>https://cognaptus.com/blog/2025-12-08-error-404-peer-review-not-found-how-llms-are-quietly-rewriting-scientific-quality-control/</link>
      <pubDate>Mon, 08 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-12-08-error-404-peer-review-not-found-how-llms-are-quietly-rewriting-scientific-quality-control/</guid>
      <description>A close reading of how a GPT-5-based correctness checker turns scientific paper auditing from artisanal peer-review labor into a scalable quality-control workflow.</description>
    </item>
    <item>
      <title>Stacking the Odds: Why Blocksworld Still Breaks Your Fancy LLM Agent</title>
      <link>https://cognaptus.com/blog/2025-12-04-stacking-the-odds-why-blocksworld-still-breaks-your-fancy-llm-agent/</link>
      <pubDate>Thu, 04 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-12-04-stacking-the-odds-why-blocksworld-still-breaks-your-fancy-llm-agent/</guid>
      <description>A practical reading of an MCP-integrated Blocksworld benchmark showing why planning, verification, execution, and replanning must be tested together before LLM agents touch real operations.</description>
    </item>
    <item>
      <title>Short Paths, Sharp Minds: Why Knowledge Graph Distance Feels Like Cognitive Gravity</title>
      <link>https://cognaptus.com/blog/2025-12-02-short-paths-sharp-minds-why-knowledge-graph-distance-feels-like-cognitive-gravity/</link>
      <pubDate>Tue, 02 Dec 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-12-02-short-paths-sharp-minds-why-knowledge-graph-distance-feels-like-cognitive-gravity/</guid>
      <description>A mechanism-first reading of how graph distance can act as a surprise signal for knowledge-graph reasoning, and why the idea is useful before it is proven.</description>
    </item>
    <item>
      <title>Parallel Worlds of Moderation: How LLM Simulations Are Stress-Testing Online Civility</title>
      <link>https://cognaptus.com/blog/2025-11-12-parallel-worlds-of-moderation-how-llm-simulations-are-stresstesting-online-civility/</link>
      <pubDate>Wed, 12 Nov 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-11-12-parallel-worlds-of-moderation-how-llm-simulations-are-stresstesting-online-civility/</guid>
      <description>Exploring how COSMOS uses counterfactual simulations powered by large language models to evaluate online moderation policies before deploying them on real users.</description>
    </item>
    <item>
      <title>Parallel Worlds of Moderation: Simulating Online Civility with LLMs</title>
      <link>https://cognaptus.com/blog/2025-11-11-parallel-worlds-of-moderation-simulating-online-civility-with-llms/</link>
      <pubDate>Tue, 11 Nov 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-11-11-parallel-worlds-of-moderation-simulating-online-civility-with-llms/</guid>
      <description>How LLM-powered simulations can test content moderation strategies without risking real social fallout.</description>
    </item>
    <item>
      <title>Divide, Cache, and Conquer: How Mixture-of-Agents is Rewriting Hardware Design</title>
      <link>https://cognaptus.com/blog/2025-11-05-divide-cache-and-conquer-how-mixtureofagents-is-rewriting-hardware-design/</link>
      <pubDate>Wed, 05 Nov 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-11-05-divide-cache-and-conquer-how-mixtureofagents-is-rewriting-hardware-design/</guid>
      <description>VERIMOA shows how multi-agent reasoning and quality caching can make LLMs outperform even fine-tuned models in chip design.</description>
    </item>
    <item>
      <title>Recursive Minds: How ReCAP Turns LLMs into Self-Correcting Planners</title>
      <link>https://cognaptus.com/blog/2025-11-02-recursive-minds-how-recap-turns-llms-into-selfcorrecting-planners/</link>
      <pubDate>Sun, 02 Nov 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-11-02-recursive-minds-how-recap-turns-llms-into-selfcorrecting-planners/</guid>
      <description>Stanford and MIT researchers introduce ReCAP, a recursive framework that allows language models to plan ahead, revise intelligently, and stay context-aware over long tasks.</description>
    </item>
    <item>
      <title>Agents in a Sandbox: Securing the Next Layer of AI Autonomy</title>
      <link>https://cognaptus.com/blog/2025-10-31-agents-in-a-sandbox-securing-the-next-layer-of-ai-autonomy/</link>
      <pubDate>Fri, 31 Oct 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-10-31-agents-in-a-sandbox-securing-the-next-layer-of-ai-autonomy/</guid>
      <description>AgentBound proposes the first security framework for Model Context Protocol servers—establishing access control, isolation, and least privilege for AI agents.</description>
    </item>
    <item>
      <title>Deep Thinking, Dynamic Acting: How DeepAgent Redefines General Reasoning</title>
      <link>https://cognaptus.com/blog/2025-10-31-deep-thinking-dynamic-acting-how-deepagent-redefines-general-reasoning/</link>
      <pubDate>Fri, 31 Oct 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-10-31-deep-thinking-dynamic-acting-how-deepagent-redefines-general-reasoning/</guid>
      <description>DeepAgent bridges the gap between large reasoning models and autonomous agents with memory folding, dynamic tool discovery, and end-to-end reinforcement learning.</description>
    </item>
    <item>
      <title>Beyond Utility: When LLM Agents Start Dreaming Their Own Tasks</title>
      <link>https://cognaptus.com/blog/2025-10-23-beyond-utility-when-llm-agents-start-dreaming-their-own-tasks/</link>
      <pubDate>Thu, 23 Oct 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-10-23-beyond-utility-when-llm-agents-start-dreaming-their-own-tasks/</guid>
      <description>Exploring how &amp;#39;open-ended&amp;#39; LLM agents shift from executing instructions to inventing goals, revealing the fragile boundary between automation and autonomy.</description>
    </item>
    <item>
      <title>Pods over Prompts: Shachi’s Playbook for Serious Agent-Based Simulation</title>
      <link>https://cognaptus.com/blog/2025-10-03-pods-over-prompts-shachis-playbook-for-serious-agentbased-simulation/</link>
      <pubDate>Fri, 03 Oct 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-10-03-pods-over-prompts-shachis-playbook-for-serious-agentbased-simulation/</guid>
      <description>Sakana AI’s Shachi turns LLM agents into modular, testable components—unlocking reproducible ABM, cross-task generalization, and even real‑world policy shock modeling. Here’s why this matters for operators and investors.</description>
    </item>
    <item>
      <title>Paths &gt; Outcomes: Measuring Agent Quality Beyond the Final State</title>
      <link>https://cognaptus.com/blog/2025-10-02-paths-outcomes-measuring-agent-quality-beyond-the-final-state/</link>
      <pubDate>Thu, 02 Oct 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-10-02-paths-outcomes-measuring-agent-quality-beyond-the-final-state/</guid>
      <description>CORE reframes LLM‑agent evaluation around the entire sequence of tool calls—catching skipped preconditions, unsafe detours, and wasteful loops that final‑state metrics miss.</description>
    </item>
    <item>
      <title>When Agents Get Bored: Three Baselines Your Autonomy Stack Already Has</title>
      <link>https://cognaptus.com/blog/2025-10-02-when-agents-get-bored-three-baselines-your-autonomy-stack-already-has/</link>
      <pubDate>Thu, 02 Oct 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-10-02-when-agents-get-bored-three-baselines-your-autonomy-stack-already-has/</guid>
      <description>A business-first read on new evidence that LLM agents, left without tasks, fall into three stable modes—and what that means for reliability, UX, and governance.</description>
    </item>
    <item>
      <title>Search Party in a Notebook: JUPITER Turns Data Analysis into a Tree Game</title>
      <link>https://cognaptus.com/blog/2025-09-17-search-party-in-a-notebook-jupiter-turns-data-analysis-into-a-tree-game/</link>
      <pubDate>Wed, 17 Sep 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-09-17-search-party-in-a-notebook-jupiter-turns-data-analysis-into-a-tree-game/</guid>
      <description>JUPITER marries a real-world notebook dataset (NbQA) with value-guided search to push small open models past heavyweight agents on multi‑step data analysis. Here’s why it matters for AI-in-the-loop analytics.</description>
    </item>
    <item>
      <title>Small Gains, Long Games: Why Tiny Accuracy Bumps Explode into Big Execution Wins</title>
      <link>https://cognaptus.com/blog/2025-09-17-small-gains-long-games-why-tiny-accuracy-bumps-explode-into-big-execution-wins/</link>
      <pubDate>Wed, 17 Sep 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-09-17-small-gains-long-games-why-tiny-accuracy-bumps-explode-into-big-execution-wins/</guid>
      <description>A new evaluation shows that small, diminishing gains in per‑step accuracy can compound into massive increases in the task length LLMs can execute—if we separate planning from execution.</description>
    </item>
    <item>
      <title>Guardrails Before Gas: Secure Plan‑Then‑Execute Agents for Real Work</title>
      <link>https://cognaptus.com/blog/2025-09-14-guardrails-before-gas-secure-planthenexecute-agents-for-real-work/</link>
      <pubDate>Sun, 14 Sep 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-09-14-guardrails-before-gas-secure-planthenexecute-agents-for-real-work/</guid>
      <description>Why Plan‑then‑Execute (P‑t‑E) is the right default for production LLM agents—and how to harden it with least privilege, sandboxing, and human validation.</description>
    </item>
    <item>
      <title>Agreeable to a Fault: Why LLM ‘People’ Can’t Hold Their Ground</title>
      <link>https://cognaptus.com/blog/2025-09-08-agreeable-to-a-fault-why-llm-people-cant-hold-their-ground/</link>
      <pubDate>Mon, 08 Sep 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-09-08-agreeable-to-a-fault-why-llm-people-cant-hold-their-ground/</guid>
      <description>New evidence shows LLM agents suppress disagreement and drift from their stated beliefs—undercutting their use as substitutes for real people in social simulation and product research.</description>
    </item>
    <item>
      <title>Plan, Act, Replan: When LLM Agents Run the Aisles</title>
      <link>https://cognaptus.com/blog/2025-09-08-plan-act-replan-when-llm-agents-run-the-aisles/</link>
      <pubDate>Mon, 08 Sep 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-09-08-plan-act-replan-when-llm-agents-run-the-aisles/</guid>
      <description>JD.com’s real deployment shows how an LLM-agent planner turns supply chain SOPs into an iterative, evidence-based loop—cutting analysis time ~40% and lifting in‑stock and accuracy metrics.</description>
    </item>
    <item>
      <title>Rules of Engagement: How Meta‑Policy Reflexion Turns Agent Memory into Guardrails</title>
      <link>https://cognaptus.com/blog/2025-09-08-rules-of-engagement-how-metapolicy-reflexion-turns-agent-memory-into-guardrails/</link>
      <pubDate>Mon, 08 Sep 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-09-08-rules-of-engagement-how-metapolicy-reflexion-turns-agent-memory-into-guardrails/</guid>
      <description>A practical look at Meta‑Policy Reflexion (MPR)—a predicate‑style memory plus hard admissibility checks that make LLM agents safer, cheaper, and more transferable without fine‑tuning.</description>
    </item>
    <item>
      <title>Control Plane, Not Pain: How Agentic OS Turns Linux Scheduling into a Semantic Service</title>
      <link>https://cognaptus.com/blog/2025-09-04-control-plane-not-pain-how-agentic-os-turns-linux-scheduling-into-a-semantic-service/</link>
      <pubDate>Thu, 04 Sep 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-09-04-control-plane-not-pain-how-agentic-os-turns-linux-scheduling-into-a-semantic-service/</guid>
      <description>SchedCP splits ‘what to optimize’ from ‘how to act,’ letting LLM agents synthesize safe, workload‑aware Linux schedulers with real gains and lower costs.</description>
    </item>
    <item>
      <title>From Prompts to Policies: The Agentic RL Playbook</title>
      <link>https://cognaptus.com/blog/2025-09-04-from-prompts-to-policies-the-agentic-rl-playbook/</link>
      <pubDate>Thu, 04 Sep 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-09-04-from-prompts-to-policies-the-agentic-rl-playbook/</guid>
      <description>A deep read on a new survey that reframes LLMs as adaptive, tool-using agents trained with reinforcement signals across long horizons—and what that means for builders.</description>
    </item>
    <item>
      <title>Mask, Don’t Muse: When Simple Memory Beats Fancy Summaries</title>
      <link>https://cognaptus.com/blog/2025-09-01-mask-dont-muse-when-simple-memory-beats-fancy-summaries/</link>
      <pubDate>Mon, 01 Sep 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-09-01-mask-dont-muse-when-simple-memory-beats-fancy-summaries/</guid>
      <description>New results on SWE-bench show a humble ‘observation mask’ can match—or beat—LLM summarization while halving agent costs.</description>
    </item>
    <item>
      <title>Mirror, Signal, Maneuver: How &#39;Self&#39; Labels Nudge LLM Cooperation</title>
      <link>https://cognaptus.com/blog/2025-08-27-mirror-signal-maneuver-how-self-labels-nudge-llm-cooperation/</link>
      <pubDate>Wed, 27 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-27-mirror-signal-maneuver-how-self-labels-nudge-llm-cooperation/</guid>
      <description>A new study shows that simply telling an LLM it’s playing against itself changes how much it contributes in an iterated public‑goods game—sometimes boosting cooperation, sometimes eroding it. We translate the results into design rules for multi‑agent AI in business settings.</description>
    </item>
    <item>
      <title>Mirror, Signal, Trade: How Self‑Reflective Agent Teams Outperform in Backtests</title>
      <link>https://cognaptus.com/blog/2025-08-26-mirror-signal-trade-how-selfreflective-agent-teams-outperform-in-backtests/</link>
      <pubDate>Tue, 26 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-26-mirror-signal-trade-how-selfreflective-agent-teams-outperform-in-backtests/</guid>
      <description>We unpack TradingGroup—a multi‑agent, self‑reflective trading framework with a built‑in data factory—and translate its ideas into a pragmatic blueprint for Cognaptus’s own market agents.</description>
    </item>
    <item>
      <title>MoA vs. Moat: Agentic LLMs for Drug Competitor Mapping Cut Diligence Time 20×</title>
      <link>https://cognaptus.com/blog/2025-08-25-moa-vs-moat-agentic-llms-for-drug-competitor-mapping-cut-diligence-time-20/</link>
      <pubDate>Mon, 25 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-25-moa-vs-moat-agentic-llms-for-drug-competitor-mapping-cut-diligence-time-20/</guid>
      <description>A new agentic workflow turns messy VC memos and the open web into a reliable map of drug competitors—outperforming Deep Research and shrinking analysis from days to hours.</description>
    </item>
    <item>
      <title>Enemy at the Gates, Friends at the Table: Why Competition Makes LLM Agents More Cooperative</title>
      <link>https://cognaptus.com/blog/2025-08-24-enemy-at-the-gates-friends-at-the-table-why-competition-makes-llm-agents-more-cooperative/</link>
      <pubDate>Sun, 24 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-24-enemy-at-the-gates-friends-at-the-table-why-competition-makes-llm-agents-more-cooperative/</guid>
      <description>A new study shows that mixing inter‑group rivalry with repeated interactions lifts both overall and one‑shot cooperation in LLM agent tournaments—offering a counterintuitive blueprint for designing trustworthy, high‑performance agent teams.</description>
    </item>
    <item>
      <title>Prefix, Not Pretext: A One‑Line Fix for Agent Misalignment</title>
      <link>https://cognaptus.com/blog/2025-08-20-prefix-not-pretext-a-oneline-fix-for-agent-misalignment/</link>
      <pubDate>Wed, 20 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-20-prefix-not-pretext-a-oneline-fix-for-agent-misalignment/</guid>
      <description>Fine-tuning turns helpful LLMs into risky agents more often than we admit. A simple, optimized prefix (PING) sharply raises refusal rates with almost no cost to task success.</description>
    </item>
    <item>
      <title>Crystal Ball, Meet Cron Job: What FutureX Reveals About ‘Live’ Forecasting Agents</title>
      <link>https://cognaptus.com/blog/2025-08-19-crystal-ball-meet-cron-job-what-futurex-reveals-about-live-forecasting-agents/</link>
      <pubDate>Tue, 19 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-19-crystal-ball-meet-cron-job-what-futurex-reveals-about-live-forecasting-agents/</guid>
      <description>FutureX stress-tests 25 agentic LLMs on ~500 fresh events per week from 195 curated sites. Here’s why a live, tiered benchmark changes how we evaluate forecasting AI—and what it means for product teams.</description>
    </item>
    <item>
      <title>Bias in the Warehouse: What AIM-Bench Reveals About Agentic LLMs</title>
      <link>https://cognaptus.com/blog/2025-08-18-bias-in-the-warehouse-what-aimbench-reveals-about-agentic-llms/</link>
      <pubDate>Mon, 18 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-18-bias-in-the-warehouse-what-aimbench-reveals-about-agentic-llms/</guid>
      <description>A deep dive into AIM-Bench—how agentic LLMs make (and mis-make) inventory decisions under uncertainty, and what to do about it.</description>
    </item>
    <item>
      <title>Consent, Coaxing, and Countermoves: Simulating Privacy Attacks on LLM Agents</title>
      <link>https://cognaptus.com/blog/2025-08-18-consent-coaxing-and-countermoves-simulating-privacy-attacks-on-llm-agents/</link>
      <pubDate>Mon, 18 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-18-consent-coaxing-and-countermoves-simulating-privacy-attacks-on-llm-agents/</guid>
      <description>A search-based simulation framework uncovers how agent-to-agent conversations escalate from polite asks to forged-consent impersonations—and what state-machine defenses actually hold up.</description>
    </item>
    <item>
      <title>Three’s Company: When LLMs Argue Their Way to Alpha</title>
      <link>https://cognaptus.com/blog/2025-08-18-threes-company-when-llms-argue-their-way-to-alpha/</link>
      <pubDate>Mon, 18 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-18-threes-company-when-llms-argue-their-way-to-alpha/</guid>
      <description>BlackRock’s ‘AlphaAgents’ shows how a three‑agent LLM team—fundamental, sentiment, and valuation—can debate their way to better stock picks, and what it means for real portfolios.</description>
    </item>
    <item>
      <title>Confounder Hunters: How LLM Agents are Rewriting the Rules of Causal Inference</title>
      <link>https://cognaptus.com/blog/2025-08-12-confounder-hunters-how-llm-agents-are-rewriting-the-rules-of-causal-inference/</link>
      <pubDate>Tue, 12 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-12-confounder-hunters-how-llm-agents-are-rewriting-the-rules-of-causal-inference/</guid>
      <description>A novel framework uses LLM-based agents to automate confounder discovery and subgroup analysis, narrowing uncertainty in treatment effect estimates while preserving interpretability.</description>
    </item>
    <item>
      <title>Meta-Game Theory: What a Pokémon League Taught Us About LLM Strategy</title>
      <link>https://cognaptus.com/blog/2025-08-09-metagame-theory-what-a-pokmon-league-taught-us-about-llm-strategy/</link>
      <pubDate>Sat, 09 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-09-metagame-theory-what-a-pokmon-league-taught-us-about-llm-strategy/</guid>
      <description>An eight-model Pokémon tournament reveals how foundation models form strategies, explain decisions, and win under uncertainty—and what that means for enterprise AI.</description>
    </item>
    <item>
      <title>Forecast First, Ask Later: How DCATS Makes Time Series Smarter with LLMs</title>
      <link>https://cognaptus.com/blog/2025-08-07-forecast-first-ask-later-how-dcats-makes-time-series-smarter-with-llms/</link>
      <pubDate>Thu, 07 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-07-forecast-first-ask-later-how-dcats-makes-time-series-smarter-with-llms/</guid>
      <description>A look into how DCATS, a data-centric LLM agent, redefines AutoML for time series forecasting by optimizing data—not just models.</description>
    </item>
    <item>
      <title>The Forest Within: How Galaxy Reinvents LLM Agents with Self-Evolving Cognition</title>
      <link>https://cognaptus.com/blog/2025-08-07-the-forest-within-how-galaxy-reinvents-llm-agents-with-selfevolving-cognition/</link>
      <pubDate>Thu, 07 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-07-the-forest-within-how-galaxy-reinvents-llm-agents-with-selfevolving-cognition/</guid>
      <description>Galaxy blends cognitive architecture with system design to create proactive, privacy-aware, and self-evolving AI agents.</description>
    </item>
    <item>
      <title>Forkcast: How Pro2Guard Predicts and Prevents LLM Agent Failures</title>
      <link>https://cognaptus.com/blog/2025-08-04-forkcast-how-pro2guard-predicts-and-prevents-llm-agent-failures/</link>
      <pubDate>Mon, 04 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-04-forkcast-how-pro2guard-predicts-and-prevents-llm-agent-failures/</guid>
      <description>Pro2Guard introduces proactive runtime safety enforcement for LLM agents using probabilistic model checking. It predicts risks before they materialize—unlike reactive systems—and balances safety with task success.</description>
    </item>
    <item>
      <title>From Autocomplete to Autonomy: How LLM Code Agents are Rewriting the SDLC</title>
      <link>https://cognaptus.com/blog/2025-08-04-from-autocomplete-to-autonomy-how-llm-code-agents-are-rewriting-the-sdlc/</link>
      <pubDate>Mon, 04 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-04-from-autocomplete-to-autonomy-how-llm-code-agents-are-rewriting-the-sdlc/</guid>
      <description>Code generation agents are no longer just smart autocompletes. They now orchestrate, reflect, and collaborate across the entire software development lifecycle. Here&amp;#39;s how.</description>
    </item>
    <item>
      <title>The Lion Roars in Crypto: How Multi-Agent LLMs Are Taming Market Chaos</title>
      <link>https://cognaptus.com/blog/2025-08-03-the-lion-roars-in-crypto-how-multiagent-llms-are-taming-market-chaos/</link>
      <pubDate>Sun, 03 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-03-the-lion-roars-in-crypto-how-multiagent-llms-are-taming-market-chaos/</guid>
      <description>MountainLion&amp;#39;s agent-based architecture blends interpretability, multi-modality, and real-time adaptability for smarter cryptocurrency trading.</description>
    </item>
    <item>
      <title>Mind&#39;s Eye for Machines: How SimuRA Teaches AI to Think Before Acting</title>
      <link>https://cognaptus.com/blog/2025-08-02-minds-eye-for-machines-how-simura-teaches-ai-to-think-before-acting/</link>
      <pubDate>Sat, 02 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-02-minds-eye-for-machines-how-simura-teaches-ai-to-think-before-acting/</guid>
      <description>SimuRA proposes a simulative reasoning framework that enables LLM agents to internally imagine futures before acting, bridging prediction and planning in pursuit of general intelligence.</description>
    </item>
    <item>
      <title>Layers of Thought: How Hierarchical Memory Supercharges LLM Agent Reasoning</title>
      <link>https://cognaptus.com/blog/2025-08-01-layers-of-thought-how-hierarchical-memory-supercharges-llm-agent-reasoning/</link>
      <pubDate>Fri, 01 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-01-layers-of-thought-how-hierarchical-memory-supercharges-llm-agent-reasoning/</guid>
      <description>Why flat memory systems limit long-term LLM agents—and how a structured four-layer memory architecture dramatically improves accuracy, efficiency, and realism.</description>
    </item>
    <item>
      <title>SIMURA Says: Don’t Guess, Simulate</title>
      <link>https://cognaptus.com/blog/2025-08-01-simura-says-dont-guess-simulate/</link>
      <pubDate>Fri, 01 Aug 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-08-01-simura-says-dont-guess-simulate/</guid>
      <description>SIMURA replaces guesswork with thought experiments, using LLMs as world models to simulate the future before acting. It may be the most serious step yet toward generalist agents.</description>
    </item>
    <item>
      <title>The User Is Present: Why Smart Agents Still Don&#39;t Get You</title>
      <link>https://cognaptus.com/blog/2025-07-30-the-user-is-present-why-smart-agents-still-dont-get-you/</link>
      <pubDate>Wed, 30 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-07-30-the-user-is-present-why-smart-agents-still-dont-get-you/</guid>
      <description>UserBench challenges LLM agents not with tasks, but with people. What happens when the real problem isn’t tool use, but the human on the other side?</description>
    </item>
    <item>
      <title>Mirage Agents: When LLMs Act on Illusions</title>
      <link>https://cognaptus.com/blog/2025-07-29-mirage-agents-when-llms-act-on-illusions/</link>
      <pubDate>Tue, 29 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-07-29-mirage-agents-when-llms-act-on-illusions/</guid>
      <description>MIRAGE-Bench reveals that even state-of-the-art LLM agents frequently hallucinate actions under real-world pressure. Here&amp;#39;s how the benchmark works and why it matters.</description>
    </item>
    <item>
      <title>Tools of Thought: Why Reasoning Isn’t an Illusion After All</title>
      <link>https://cognaptus.com/blog/2025-07-24-tools-of-thought-why-reasoning-isnt-an-illusion-after-all/</link>
      <pubDate>Thu, 24 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-07-24-tools-of-thought-why-reasoning-isnt-an-illusion-after-all/</guid>
      <description>Tool-augmented LLMs reverse the narrative that reasoning models are overhyped. A new study shows that Python interpreters and scratchpads make LRMs outperform standard LLMs across problem complexity.</description>
    </item>
    <item>
      <title>The Watchdog at the Gates: How HalMit Hunts Hallucinations in LLM Agents</title>
      <link>https://cognaptus.com/blog/2025-07-23-the-watchdog-at-the-gates-how-halmit-hunts-hallucinations-in-llm-agents/</link>
      <pubDate>Wed, 23 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-07-23-the-watchdog-at-the-gates-how-halmit-hunts-hallucinations-in-llm-agents/</guid>
      <description>A deep dive into HalMit, a black-box framework that tames hallucinations in LLM-empowered agents by modeling per-domain generalization bounds.</description>
    </item>
    <item>
      <title>The Butterfly Defect: Diagnosing LLM Failures in Tool-Agent Chains</title>
      <link>https://cognaptus.com/blog/2025-07-22-the-butterfly-defect-diagnosing-llm-failures-in-toolagent-chains/</link>
      <pubDate>Tue, 22 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-07-22-the-butterfly-defect-diagnosing-llm-failures-in-toolagent-chains/</guid>
      <description>Tool-augmented LLMs often stumble not at planning, but at parsing—where subtle parameter issues ripple into major breakdowns. This article dives into a new taxonomy of these failures, their causes, and what builders can do about it.</description>
    </item>
    <item>
      <title>Agents of Disruption: How LLMs Became Adversarial Testers for Autonomous Driving</title>
      <link>https://cognaptus.com/blog/2025-07-21-agents-of-disruption-how-llms-became-adversarial-testers-for-autonomous-driving/</link>
      <pubDate>Mon, 21 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-07-21-agents-of-disruption-how-llms-became-adversarial-testers-for-autonomous-driving/</guid>
      <description>AGENTS-LLM proposes a powerful agentic framework where LLMs act not as scene generators, but as safety-critical adversaries in the closed-loop evaluation of autonomous driving planners.</description>
    </item>
    <item>
      <title>Personas with Purpose: How TinyTroupe Reimagines Multiagent Simulation</title>
      <link>https://cognaptus.com/blog/2025-07-15-personas-with-purpose-how-tinytroupe-reimagines-multiagent-simulation/</link>
      <pubDate>Tue, 15 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-07-15-personas-with-purpose-how-tinytroupe-reimagines-multiagent-simulation/</guid>
      <description>TinyTroupe transforms LLM-powered agents from task solvers into behavioral simulators, enabling richer, more realistic personas for experimentation, UX prototyping, and synthetic data generation.</description>
    </item>
    <item>
      <title>The First Hurdle: Why Coding Agents Struggle with Setup</title>
      <link>https://cognaptus.com/blog/2025-07-15-the-first-hurdle-why-coding-agents-struggle-with-setup/</link>
      <pubDate>Tue, 15 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-07-15-the-first-hurdle-why-coding-agents-struggle-with-setup/</guid>
      <description>SetupBench reveals a blind spot in coding agents: real-world environment bootstrapping. This overlooked challenge undermines LLM agents&amp;#39; promise of end-to-end software automation.</description>
    </item>
    <item>
      <title>Threading the Needle: How GRAFT Reinvents Document Translation with DAGs and LLM Agents</title>
      <link>https://cognaptus.com/blog/2025-07-12-threading-the-needle-how-graft-reinvents-document-translation-with-dags-and-llm-agents/</link>
      <pubDate>Sat, 12 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-07-12-threading-the-needle-how-graft-reinvents-document-translation-with-dags-and-llm-agents/</guid>
      <description>GRAFT introduces a graph-based multi-agent framework that significantly improves document-level machine translation by addressing discourse-level phenomena through LLMs.</description>
    </item>
    <item>
      <title>Secret Handshakes at Scale: How LLM Agents Learn to Collude</title>
      <link>https://cognaptus.com/blog/2025-07-07-secret-handshakes-at-scale-how-llm-agents-learn-to-collude/</link>
      <pubDate>Mon, 07 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-07-07-secret-handshakes-at-scale-how-llm-agents-learn-to-collude/</guid>
      <description>New research finds that large language model agents, when given the chance to communicate, can spontaneously collude in auction settings—even under regulatory pressure.</description>
    </item>
    <item>
      <title>From ETL to Orchestral Intelligence: The Rise of the Data Agent</title>
      <link>https://cognaptus.com/blog/2025-07-03-from-etl-to-orchestral-intelligence-the-rise-of-the-data-agent/</link>
      <pubDate>Thu, 03 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-07-03-from-etl-to-orchestral-intelligence-the-rise-of-the-data-agent/</guid>
      <description>Data Agents promise to unify LLMs, data tools, and reasoning into cohesive AI&#43;Data ecosystems. This piece explores their architecture and implications for enterprise automation.</description>
    </item>
    <item>
      <title>Chains of Causality, Not Just Thought</title>
      <link>https://cognaptus.com/blog/2025-07-02-chains-of-causality-not-just-thought/</link>
      <pubDate>Wed, 02 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-07-02-chains-of-causality-not-just-thought/</guid>
      <description>How Causal Influence Prompting (CIP) reframes LLM safety by formalizing decision-making in agentic tasks.</description>
    </item>
    <item>
      <title>Chatbot at the Table: Rethinking Group Recommendations with GenAI</title>
      <link>https://cognaptus.com/blog/2025-07-02-chatbot-at-the-table-rethinking-group-recommendations-with-genai/</link>
      <pubDate>Wed, 02 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-07-02-chatbot-at-the-table-rethinking-group-recommendations-with-genai/</guid>
      <description>Why group recommender systems have failed to thrive — and how generative AI might finally make them useful by turning algorithms into mediators.</description>
    </item>
    <item>
      <title>Agents Under Siege: How LLM Workflows Invite a New Breed of Cyber Threats</title>
      <link>https://cognaptus.com/blog/2025-07-01-agents-under-siege-how-llm-workflows-invite-a-new-breed-of-cyber-threats/</link>
      <pubDate>Tue, 01 Jul 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-07-01-agents-under-siege-how-llm-workflows-invite-a-new-breed-of-cyber-threats/</guid>
      <description>LLM-powered agents are revolutionizing AI automation—but their reliance on complex toolchains and agent protocols creates cascading security risks. This article explores the emerging threat model and its implications for enterprise AI.</description>
    </item>
    <item>
      <title>Catalysts of Thought: How LLM Agents are Reinventing Chemical Process Optimization</title>
      <link>https://cognaptus.com/blog/2025-06-27-catalysts-of-thought-how-llm-agents-are-reinventing-chemical-process-optimization/</link>
      <pubDate>Fri, 27 Jun 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-06-27-catalysts-of-thought-how-llm-agents-are-reinventing-chemical-process-optimization/</guid>
      <description>By autonomously inferring operating constraints and collaborating across specialized roles, LLM agents outperform traditional optimizers in chemical process design.</description>
    </item>
    <item>
      <title>Playing with Strangers: A New Benchmark for Ad-Hoc Human-AI Teamwork</title>
      <link>https://cognaptus.com/blog/2025-06-27-playing-with-strangers-a-new-benchmark-for-adhoc-humanai-teamwork/</link>
      <pubDate>Fri, 27 Jun 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-06-27-playing-with-strangers-a-new-benchmark-for-adhoc-humanai-teamwork/</guid>
      <description>A new challenge using the game Hanabi brings us closer to human-compatible AI agents by enabling reproducible, low-cost evaluation of ad-hoc coordination.</description>
    </item>
    <item>
      <title>The Joy of Many Minds: How JoyAgents-R1 Unleashes the Power of Multi-LLM Reinforcement Learning</title>
      <link>https://cognaptus.com/blog/2025-06-25-the-joy-of-many-minds-how-joyagentsr1-unleashes-the-power-of-multillm-reinforcement-learning/</link>
      <pubDate>Wed, 25 Jun 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-06-25-the-joy-of-many-minds-how-joyagentsr1-unleashes-the-power-of-multillm-reinforcement-learning/</guid>
      <description>JoyAgents-R1 introduces a groundbreaking framework that enables multiple heterogeneous language model agents to evolve together using Group Relative Policy Optimization (GRPO), improving coordination, reasoning, and memory with minimal resources.</description>
    </item>
    <item>
      <title>Innovation, Agentified: How TRIZ Got Its AI Makeover</title>
      <link>https://cognaptus.com/blog/2025-06-24-innovation-agentified-how-triz-got-its-ai-makeover/</link>
      <pubDate>Tue, 24 Jun 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-06-24-innovation-agentified-how-triz-got-its-ai-makeover/</guid>
      <description>Exploring how multi-agent LLM systems can simulate human innovation teams using the structured TRIZ methodology, achieving creative problem-solving with autonomy and orchestration.</description>
    </item>
    <item>
      <title>The Memory Advantage: When AI Agents Learn from the Past</title>
      <link>https://cognaptus.com/blog/2025-06-03-the-memory-advantage-when-ai-agents-learn-from-the-past/</link>
      <pubDate>Tue, 03 Jun 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-06-03-the-memory-advantage-when-ai-agents-learn-from-the-past/</guid>
      <description>Cognaptus explores how Agentic Episodic Control enables language agents to plan better, fail less, and evolve over time—by remembering what worked before. This cognitive leap could reshape how businesses deploy intelligent agents.</description>
    </item>
    <item>
      <title>Mind the Context: How ContextAgent Listens, Sees, and Acts Before You Ask</title>
      <link>https://cognaptus.com/blog/2025-05-21-mind-the-context-how-contextagent-listens-sees-and-acts-before-you-ask/</link>
      <pubDate>Wed, 21 May 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-05-21-mind-the-context-how-contextagent-listens-sees-and-acts-before-you-ask/</guid>
      <description>This article explores ContextAgent, a proactive AI assistant that uses sensory data from wearables to anticipate user needs without explicit instructions, setting a new benchmark for LLM agents.</description>
    </item>
    <item>
      <title>When Smart AI Gets It Wrong: Diagnosing the Knowing-Doing Gap in Language Model Agents</title>
      <link>https://cognaptus.com/blog/2025-04-23-when-smart-ai-gets-it-wrong-diagnosing-the-knowingdoing-gap-in-language-model-agents/</link>
      <pubDate>Wed, 23 Apr 2025 00:00:00 +0000</pubDate>
      <guid>https://cognaptus.com/blog/2025-04-23-when-smart-ai-gets-it-wrong-diagnosing-the-knowingdoing-gap-in-language-model-agents/</guid>
      <description>A deep dive into why powerful language models still make simple mistakes—and how businesses can build agents that not only know, but act.</description>
    </item>
  </channel>
</rss>
