Protocol Over Hype: Why AI Drug Discovery Agents Need Memory, Not Just Models

Opening — Why this matters now

AI drug discovery has quietly crossed a threshold. The conversation is no longer about whether models can generate molecules—it’s about whether agents can consistently deliver usable results under constraints.

And that’s where things begin to break.

Most agentic systems in drug discovery look impressive in demos: they generate candidates, optimize structures, and even run docking simulations. But when evaluated properly—at the set level, under real medicinal chemistry constraints—the success rate collapses.

The paper fileciteturn0file0 introduces a blunt but necessary correction: success is not about a good molecule. It’s about returning a set of molecules that jointly satisfy a protocol.

That distinction sounds subtle. It isn’t.

Background — From Molecules to Systems

Historically, drug discovery AI evolved along two parallel tracks:

Better generators (diffusion models, graph models, pocket-conditioned design)
Better planners (LLM agents coordinating tools and workflows)

Both improved local performance. Neither solved the global problem.

The Hidden Failure Mode

A typical agent pipeline looks like this:

Generate molecules
Optimize promising candidates
Filter via screening
Repeat

At each step, progress seems measurable. But the final evaluation is brutal:

Requirement	Example Constraint
Size	≥ 5 molecules
Diversity	≥ 0.8
Binding	Docking score threshold
Drug-likeness	QED, Lipinski rules
Developability	SAS score

A system can produce excellent molecules and still fail entirely.

This is the set-level control problem.

And most current agents are, frankly, guessing their way through it.

Analysis — What CACM Actually Changes

The core contribution of the paper is not another model. It is a control layer.

CACM (Constraint-Aware Corrective Memory) reframes the agent loop around two principles:

Precision — explicitly diagnose why the current result fails
Parsimony — only retain information that helps the next decision

1. Protocol Audit: No More Vibes-Based Evaluation

Instead of letting the LLM “feel” whether results are good, CACM enforces a deterministic gate:

Every constraint is checked mathematically
Violations are quantified as residuals

This converts failure into something actionable:

Signal Type	Meaning
Negative residual	Constraint violated
Magnitude	Distance from success

This alone eliminates a surprising amount of ambiguity.

2. Grounded Diagnosis: Turning Failure into Strategy

Rather than generic reflection (“try improving binding”), CACM produces:

Dominant failure type
Specific violated constraints
Repair hint
Next-action bias (Generate / Optimize / Code)

In one example (KIT target), the system correctly identifies:

Diversity is already sufficient. Binding is the bottleneck.

That sounds obvious. Most agents miss it.

3. Structured Memory: The Real Innovation

This is where CACM becomes interesting.

Instead of dumping history into the prompt, memory is split into three channels:

Memory Type	Role
Static	Task + pocket context
Dynamic	Current pool + recent actions
Corrective	Diagnosed failures + fixes

Then comes the key move: compression before write-back.

Only top-k signals are retained
Each channel has strict budget limits
Formatting is deterministic (not LLM summarization)

Result: the agent sees a decision-ready state, not a messy log.

Findings — Performance Is a Control Problem

The results are… inconvenient for anyone betting purely on better models.

Target-Level Success Rate (TSR)

Method	TSR
Baseline (LIDDiA)	73.3%
CACM	100.0%

That’s a +36.4% improvement. fileciteturn0file0

More Interesting: Smaller, Better Outputs

Metric	Baseline	CACM
Avg. pool size	21.0	5.0
Iterations to success	4.40	3.07

CACM doesn’t search more. It stops earlier with valid results.

Why This Matters

Most systems optimize for:

Better molecules
Larger candidate pools

CACM optimizes for:

Valid final sets

Those are not the same objective.

Ablation Insight: What Actually Drives Gains

Component Removed	TSR Impact
Repair signal only	+13% improvement
No corrective selection	performance drops
No dynamic compression	larger pools, slower convergence

Interpretation:

Diagnosis makes failure visible
Selection prevents noise
Compression enables decision quality

In other words: memory design is the system.

Implications — This Is Bigger Than Drug Discovery

CACM is not just about chemistry. It exposes a broader pattern in agent systems.

1. Most Agent Failures Are Control Failures

Not model limitations.

Agents often fail because they:

Misinterpret progress
Lose track of constraints
Overweight irrelevant history

Sound familiar? That’s not a molecule problem. That’s a state representation problem.

2. “More Context” Is a Trap

Raw history grows. Signal quality does not.

CACM shows that:

Structured memory > longer prompts
Selection > accumulation

This has direct implications for enterprise AI systems where logs explode over time.

3. Deterministic Layers Are Underrated

The audit layer is fully deterministic.

This is not a limitation—it’s an advantage:

Reproducibility
Interpretability
Alignment with domain constraints

Expect more hybrid systems where:

LLM = planner
Code = verifier

4. ROI Perspective: Why Businesses Should Care

From a business lens, CACM delivers three things:

Dimension	Impact
Reliability	100% task completion vs partial success
Efficiency	fewer iterations, lower total cost
Interpretability	explicit failure diagnostics

This is the difference between:

A demo system
A deployable system

Conclusion — Intelligence Needs Structure

The industry keeps asking how to make agents “smarter.”

This paper suggests a different question:

What if they just need to remember better?

CACM demonstrates that reliability in complex tasks doesn’t come from bigger models or more tools.

It comes from:

Clear success definitions
Explicit failure signals
Structured, minimal memory

In short: control beats cleverness.

And that’s a lesson most AI systems are still learning the hard way.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — From Molecules to Systems#

The Hidden Failure Mode#

Analysis — What CACM Actually Changes#

1. Protocol Audit: No More Vibes-Based Evaluation#

2. Grounded Diagnosis: Turning Failure into Strategy#

3. Structured Memory: The Real Innovation#

Findings — Performance Is a Control Problem#

Target-Level Success Rate (TSR)#

More Interesting: Smaller, Better Outputs#

Why This Matters#

Ablation Insight: What Actually Drives Gains#

Implications — This Is Bigger Than Drug Discovery#

1. Most Agent Failures Are Control Failures#

2. “More Context” Is a Trap#

3. Deterministic Layers Are Underrated#

4. ROI Perspective: Why Businesses Should Care#

Conclusion — Intelligence Needs Structure#