From Blueprints to Prompts: Automating Building–Grid Intelligence with LLM Agents

Opening — Why this matters now

There’s a quiet bottleneck in the AI-for-infrastructure story: not intelligence, but integration.

We have reinforcement learning models that can optimize building energy usage. We have power system simulators that can stress-test grid resilience. What we don’t have—at least not cleanly—is a way to connect them without turning every experiment into a bespoke engineering project.

The result? Most “smart energy” systems remain siloed. Buildings optimize themselves. Grids react. Nobody orchestrates.

The paper fileciteturn0file0 introduces AutoB2G, a framework that attempts to close this gap—not by adding another model, but by automating the entire simulation workflow using large language models (LLMs).

And yes, that’s where things get interesting.

Background — The simulation paradox

Simulation environments like CityLearn, GridLearn, and EnergyPlus have been the backbone of building energy research. They are powerful, flexible, and—predictably—painful to use.

The paradox is simple:

Capability	Reality
High-fidelity modeling	Requires deep domain expertise
Flexible configuration	Requires extensive manual coding
RL integration	Limited to building-side metrics
Grid interaction	Often bolted on, not native

Most existing systems optimize for building performance (cost, comfort, emissions), while grid-level effects—voltage stability, line loading, resilience—are treated as secondary or ignored entirely.

AutoB2G reframes the problem: instead of asking how to build better models, it asks how to make the entire modeling process programmable via language.

Analysis — What AutoB2G actually builds

AutoB2G is not just another simulation environment. It’s a layered system combining three key ideas:

A co-simulation environment (buildings + grid)
A DAG-structured codebase for reasoning
A multi-agent LLM orchestration system (SOCIA)

Let’s unpack this without the academic fog.

1. Building–Grid Co-Simulation: finally, a shared reality

AutoB2G integrates:

CityLearn V2 → building dynamics and RL control
Pandapower → grid simulation and power flow analysis
EnergyPlus → high-fidelity building data generation

The key shift is bidirectional interaction:

Buildings → affect grid load
Grid → feeds back constraints (e.g., voltage) into control decisions

The reward function itself becomes grid-aware:

$$ r_t = \frac{1}{|B|} \sum_{i \in B} \left( V_{ref} - \alpha_i (V_{i,t} - V_{ref})^2 \right) $$

Translation: buildings are no longer optimizing in isolation—they are penalized for destabilizing the grid.

This is subtle, but it changes everything.

2. DAG-Based Retrieval: teaching LLMs structure

LLMs are good at generating code. They are notoriously bad at respecting dependencies.

AutoB2G solves this by representing the entire simulation codebase as a Directed Acyclic Graph (DAG):

Nodes = functions/modules
Edges = dependencies
Constraints = execution order

Formally:

$$ G = (V, E), \quad V = {f_1, f_2, …, f_n} $$

Instead of asking the LLM to “write code,” the system asks it to:

Select relevant modules
Validate dependency completeness
Repair missing links iteratively

This turns code generation into something closer to workflow assembly.

A rare moment of discipline in an otherwise chaotic space.

3. SOCIA + TGD: optimizing code like a model

The real intellectual novelty sits here.

AutoB2G uses the SOCIA framework, where multiple agents collaborate to:

Generate code
Execute simulations
Evaluate results
Produce feedback

But the twist is Textual Gradient Descent (TGD).

Instead of numeric gradients, the system uses language as the optimization signal:

$$ L(x) = \sum_i \max(0, c_i(x)) $$

Where violations (syntax errors, missing modules, runtime failures) define the loss.

The “gradient” becomes:

$$ g_t = \nabla_{LLM}(x_t, {c_i(x_t)}) $$

Which is… a structured explanation of what went wrong.

In other words:

The model doesn’t just fail—it critiques itself into improvement.

A slightly philosophical, slightly dangerous idea.

Findings — Does this actually work?

The paper evaluates four setups:

Method	Simple	Medium	Complex
LLM	0.90	0.77	0.53
SOCIA	0.93	0.83	0.73
LLM + Retrieval	0.97	0.80	0.67
SOCIA + Retrieval	1.00	0.93	0.83

Two observations worth noting:

Complexity kills naive LLMs
- Success drops from 0.90 → 0.53
Structure + iteration restores reliability
- SOCIA + retrieval sustains 0.83 even for complex workflows

Now look at code quality:

Method	Simple	Medium	Complex
LLM	0.69	0.66	0.44
SOCIA	0.82	0.78	0.67
LLM + Retrieval	0.72	0.74	0.73
SOCIA + Retrieval	1.00	0.84	0.88

The gap between working code and correct code becomes very visible here.

Grid-level impact (where this actually matters)

Beyond code generation, the framework shows tangible system effects:

Metric	Baseline	RL-Controlled
Voltage spread	Wide (±0.4 p.u.)	Narrow (near 1.0 p.u.)
Over-voltage frequency	High	Reduced
Load behavior	Reactive	Adaptive

In plain terms:

Buildings learn to consume more when voltage is high
And consume less when voltage is low

That’s demand response behaving like an actual system component—not a passive participant.

Implications — Why this is bigger than energy systems

AutoB2G is nominally about buildings and grids. It’s actually about something else:

Turning natural language into executable infrastructure logic.

This has three immediate implications:

1. Simulation becomes a product, not a skill

Instead of hiring specialists to configure environments, you describe the experiment:

“Train a SAC model”
“Add N–1 contingency analysis”
“Compare centralized vs decentralized control”

And the system builds it.

The bottleneck shifts from technical capability to problem framing.

2. Agentic systems outperform single-shot intelligence

The paper quietly confirms a trend:

Approach	Limitation
Single LLM	brittle, inconsistent
RAG	context-aware but shallow
Multi-agent + feedback	iterative, robust

The future is not a smarter model.

It’s a system that can argue with itself until it’s right.

3. DAGs may be the missing abstraction layer

Everyone talks about prompt engineering.

Almost no one talks about structural constraints.

AutoB2G suggests that:

Knowledge → retrieved
Reasoning → guided
Execution → constrained

This is less “AI magic,” more software architecture with an LLM interface.

A healthier direction, frankly.

Conclusion — From automation to orchestration

AutoB2G doesn’t just automate simulation.

It redefines what simulation is: a composable, language-driven workflow that can be generated, validated, and refined autonomously.

The real takeaway isn’t that LLMs can write code.

It’s that with the right scaffolding—DAGs, agents, feedback loops—they can own entire execution pipelines.

Which raises the obvious question:

If simulation can be automated end-to-end… what else can?

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — The simulation paradox#

Analysis — What AutoB2G actually builds#

1. Building–Grid Co-Simulation: finally, a shared reality#

2. DAG-Based Retrieval: teaching LLMs structure#

3. SOCIA + TGD: optimizing code like a model#

Findings — Does this actually work?#

Grid-level impact (where this actually matters)#

Implications — Why this is bigger than energy systems#

1. Simulation becomes a product, not a skill#

2. Agentic systems outperform single-shot intelligence#

3. DAGs may be the missing abstraction layer#

Conclusion — From automation to orchestration#