Prompt-to-Parts: When Language Learns to Build

Opening — Why this matters now

Text-to-image was a party trick. Text-to-3D became a demo. Text-to-something you can actually assemble is where the stakes quietly change.

As generative AI spills into engineering, manufacturing, and robotics, the uncomfortable truth is this: most AI-generated objects are visually plausible but physically useless. They look right, but they don’t fit, don’t connect, and certainly don’t come with instructions a human can follow.

The paper behind this article tackles that gap head-on—not by making models “smarter,” but by making the problem smaller, stricter, and more honest.

Background — The realizability problem no one likes to admit

Modern text-to-3D pipelines optimize for pixels, meshes, or parametric surfaces. Physical reality is, at best, an afterthought. Gravity, tolerances, assembly order, and part compatibility tend to enter the picture after generation, usually as a failure mode.

This has produced a familiar pattern:

Beautiful renders
Fragile structures
No build sequence
No bill of materials

LEGO-style construction systems expose the flaw immediately. You either respect discrete parts and connections—or nothing snaps together.

Analysis — What the paper actually does

The core idea is deceptively simple: treat physical assembly like a language compilation problem.

Instead of generating geometry directly, the system compiles natural language into LDraw, a text-based format that encodes:

A finite vocabulary of parts
Exact spatial coordinates and orientations
Explicit build order

This turns free-form intent (“build a detailed ISS model”) into something closer to source code than artwork.

The author calls this a bag-of-bricks approach—an intentional echo of bag-of-words methods in NLP. Meaning emerges not from continuous geometry, but from constrained composition.

The pipeline

Prompt — Natural language (or image-derived description)
Tool-assisted translation — Python libraries enforce legal parts, connections, and coordinates
Output — A valid .ldr file with ordered assembly steps

The important detail: the language model is never trusted alone. Tools act as a compiler, not a stylist.

Findings — Scale, structure, and instruction fidelity

The results are not small demos. They are deliberately uncomfortable in scale.

Example assemblies

Model	Parts	Build Steps	Instruction Pages
Medieval Castle	860	82	86
International Space Station	3,122	112	312
Modular Tool Kit	153	20	15
Helicopter (MH‑60)	746	95	75

Three evaluation axes are used:

D-score — Is the representation syntactically valid?
M-score — Is the model physically realizable?
I-score — Can a human follow the instructions end-to-end?

Crucially, the system generates assembly manuals, not just final states. That alone disqualifies most existing text-to-3D systems from comparison.

Implications — Why this matters beyond LEGO

This work is not really about bricks. It’s about interfaces.

By constraining the representation, the system gains:

Scalability (thousands of parts)
Modularity (subassemblies, replacements)
Auditability (every decision is inspectable)

The most interesting comparison is not to CAD, but to additive manufacturing.

Modular assembly vs 3D printing (field scenario)

Metric	3D Printing	Modular Assembly
Time to tool	Hours	Minutes
Material loss	Permanent	Zero
Reconfiguration	Reprint	Instant
Calibration	Required	Inherent

In constrained environments—space stations, disaster zones, field labs—the ability to reconfigure often beats geometric perfection.

The paper frames this as a physical API: a stable interface between intent and matter.

Limitations — Where the bricks still crack

The system is not physics-aware. LDraw guarantees geometric legality, not load-bearing reality. Parts can float. Structures can intersect. Functional fidelity is approximated, not proven.

Part libraries also limit expressiveness. If a hinge doesn’t exist, the model improvises—or fails quietly.

This is not a replacement for CAD, simulation, or manufacturing. It is a pre-manufacturing intelligence layer.

Conclusion — The compiler was the missing piece

The quiet insight of this work is that generative AI doesn’t fail at physical design because it lacks creativity—it fails because it lacks compilers.

Once language is forced through a constrained, inspectable intermediate representation, large language models stop hallucinating and start assembling.

The thousand-page manual, it turns out, was always a thousand-token problem—waiting for the right abstraction.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why this matters now#

Background — The realizability problem no one likes to admit#

Analysis — What the paper actually does#

The pipeline#

Findings — Scale, structure, and instruction fidelity#

Example assemblies#

Implications — Why this matters beyond LEGO#

Modular assembly vs 3D printing (field scenario)#

Limitations — Where the bricks still crack#

Conclusion — The compiler was the missing piece#