Agent Factories: When More AI Means Better Hardware

Opening — Why this matters now

The industry has spent the last decade trying to make hardware design feel more like software. High-Level Synthesis (HLS) promised exactly that: write C/C++, press a button, get efficient hardware.

Reality, predictably, had other plans.

Even today, HLS remains a craft. Engineers manually tune pragmas, restructure loops, and wrestle with latency–area trade-offs like it’s still 2008—just with better tooling. The abstraction improved, but the cognitive burden did not.

Now, something quietly unsettling is happening: general-purpose AI coding agents—without any hardware-specific training—are starting to outperform structured optimization pipelines.

Not by being smarter. By being more numerous.

Background — The limits of structured optimization

Traditional HLS optimization approaches fall into a familiar pattern: define a search space, then explore it efficiently.

Approach Type	Strength	Limitation
Bayesian / surrogate models	Efficient search	Restricted to predefined parameters
ILP / mathematical optimization	Globally consistent selection	Cannot rewrite code
LLM-based pragma generation	Flexible directive suggestions	Still mostly parameter-bound

The core problem is subtle but fatal: they optimize configurations, not programs.

But hardware performance is often unlocked not by better parameters—but by rewriting the computation itself. Loop fusion, memory layout changes, algebraic simplifications—these are not parameters. They are transformations.

This is where agentic systems enter the scene.

Analysis — What the paper actually does

The paper introduces what it calls an Agent Factory—a two-stage system that treats optimization as a coordinated, multi-agent exploration process rather than a single search problem. fileciteturn0file0

Stage 1 — Decompose, optimize, recombine

First, the system breaks a program into sub-functions (kernels), then assigns each to an independent optimization agent.

Each agent generates multiple variants using strategies like:

Variant Type	Strategy
Baseline	No changes
Conservative	Minimal pragmas
Pipeline	Varying pipeline depths
Aggressive	Pipeline + loop unrolling
Structural	Memory partitioning, code rewrites

Each variant is:

Verified for correctness
Synthesized
Measured (latency + area)

Then comes a critical step: Integer Linear Programming (ILP) selects the best combination under a global area constraint.

But—and this is where it gets interesting—the ILP does not produce a single answer.

It produces multiple good answers.

Stage 2 — Parallel exploration (a.k.a. brute-force intelligence)

Instead of trusting the “best” ILP solution, the system launches N independent agents, each starting from a different candidate.

Each agent explores:

Cross-function optimizations
Loop restructuring
Memory reorganization
Pragma recombination

Then they compete.

The best design wins.

If this sounds less like optimization and more like evolutionary search with language models, that’s because it is.

Findings — Scaling agents scales performance

The results are not subtle.

1. More agents → better designs

Agents (N)	Mean Speedup vs Baseline
1	~5.26×
2	~5.81×
4	~7.66×
10	~8.27×

The gains are especially dramatic for complex workloads:

Benchmark	Speedup
streamcluster	>20×
kmeans	~10×
lavamd	~8×

From the Pareto plots (page 4), you can see a clear pattern: increasing agents expands the frontier toward better latency–area trade-offs. fileciteturn0file0

In simpler terms: more parallel thinking leads to better hardware.

2. The best solution is often not the obvious one

A quietly important result: the final winning design frequently does not come from the top ILP candidate.

Translation: local optimization is misleading.

Only global, exploratory search—across multiple agents—reveals better configurations.

3. Agents rediscover human expertise (without being taught)

Across benchmarks, agents independently learned patterns that hardware engineers already know:

ARRAY_PARTITION resolves memory bottlenecks
PIPELINE is useless without fixing dependencies

No training. Just iteration.

Which is mildly concerning if your job involves writing those pragmas.

4. Diminishing returns still exist

Not everything scales forever.

Simple kernels plateau quickly
Tight area constraints reduce gains
Some workloads show non-monotonic improvements

In other words: throwing agents at the problem helps—until physics politely intervenes.

Implications — A new axis of optimization: inference-time scaling

The real contribution is not the pipeline itself.

It’s the idea that:

Optimization quality can be improved by increasing the number of agents at inference time.

This reframes optimization in a way business leaders should pay attention to.

1. Compute becomes strategy

Instead of better algorithms, you can:

Run more agents
Explore more possibilities
Accept higher token cost

The paper reports millions of tokens per run, with heavy variance. fileciteturn0file0

This is not cheap.

But it is scalable.

2. Expertise is being commoditized

These agents:

Have no hardware-specific training
Yet recover domain-specific heuristics

Meaning: expertise is shifting from people to processes.

The competitive edge becomes:

Better orchestration
Better evaluation pipelines
Better integration with tooling

Not necessarily better engineers.

3. Multi-agent systems are becoming the default

Single-agent systems are tidy.

Multi-agent systems are messy, redundant, and inefficient.

They are also—apparently—more effective.

Expect this pattern to repeat across domains:

Finance (multi-strategy trading agents)
Marketing (multi-creative exploration)
Operations (multi-scenario planning)

The logic is identical: explore more, pick the best.

Conclusion — Intelligence is no longer singular

The quiet thesis of this paper is deceptively simple:

You don’t need a smarter agent. You need more agents.

Agent factories turn optimization into a probabilistic process—one where success is less about precision and more about coverage.

For businesses, the implication is uncomfortable but clear:

The bottleneck is no longer intelligence
It is orchestration and compute allocation

And for engineers?

Well.

Let’s just say the compiler is starting to think for itself.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — The limits of structured optimization#

Analysis — What the paper actually does#

Stage 1 — Decompose, optimize, recombine#

Stage 2 — Parallel exploration (a.k.a. brute-force intelligence)#

Findings — Scaling agents scales performance#

1. More agents → better designs#

2. The best solution is often not the obvious one#

3. Agents rediscover human expertise (without being taught)#

4. Diminishing returns still exist#

Implications — A new axis of optimization: inference-time scaling#

1. Compute becomes strategy#

2. Expertise is being commoditized#

3. Multi-agent systems are becoming the default#

Conclusion — Intelligence is no longer singular#