One-Hot Walls, LLaMA Doors: Teaching AI the Language of Buildings

Opening — Why This Matters Now

Everyone wants AI in construction. Fewer ask whether the AI actually understands what it is looking at.

In the Architecture, Engineering, Construction, and Operation (AECO) industry, we feed models building information models (BIMs), point clouds, images, schedules, and text. We train graph neural networks. We compute F1-scores. We celebrate marginal gains.

Yet beneath this machinery sits a surprisingly primitive assumption: that semantic labels like “core wall” and “bathroom slab” are interchangeable tokens — as long as they are distinct.

The paper Enhancing Building Semantics Preservation in AI Model Training with Large Language Model Encodings fileciteturn0file0 challenges that assumption. Its thesis is subtle but consequential: if you change how classes are encoded, you change how meaning is learned.

And that is not just a modeling trick. It is a shift in how AI internalizes domain knowledge.

Supervised learning in BIM-based tasks typically relies on one-hot encoding.

If there are 42 object subtypes, each subtype is assigned a 42-dimensional vector with a single “1” and 41 zeros. In geometric terms, every class is equidistant from every other class.

Encoding Method	Semantic Awareness	Distance Between Classes
One-hot	None	Equal for all pairs
Label encoding	Ordinal illusion	Artificial numeric bias
LLM embedding	Contextual	Learned semantic distance

From a machine’s perspective, “core wall” is as different from “perimeter wall” as it is from “haunch.”

To a construction professional, that is absurd.

The authors ground this in the classical semantic triangle (referent–reference–symbol). Prior research has improved how referents are represented (graphs, point clouds, images), but rarely questions the symbol — the encoding.

Large Language Models (LLMs), trained on massive corpora, already encode nuanced semantic proximity. Why not use that knowledge as the label space itself?

That is the provocation.

Method — Replacing Classification with Semantic Projection

Instead of predicting a one-hot vector and applying a sigmoid, the model predicts an embedding in the same space as a pre-trained LLM embedding.

The loss is computed via cosine similarity:

$$ L(e_p, e_t) = 1 - \frac{e_p \cdot e_t}{|e_p||e_t|} $$

Where:

$e_p$ is the predicted embedding
$e_t$ is the target LLM embedding

This does two things:

It preserves semantic geometry.
It turns classification into proximity search in embedding space.

The experiment uses:

5 high-rise residential BIM projects
42 building object subtypes
GraphSAGE (3 layers, 1024-dim hidden states)
Cross-validation across projects

Embeddings tested:

Model	Original Dim	Compacted Dim (Matryoshka)
text-embedding-3-small	1536	1024
text-embedding-3-large	3072	1024
llama-3	4096	1024

The Matryoshka representation model compresses embeddings while preserving semantic structure.

This is not merely dimensionality reduction. It is semantic distillation.

Findings — Small Encoding Change, Measurable Shift

The weighted average F1-scores tell a quiet story:

Encoding	Dimensions	Weighted Avg F1
One-hot	42	0.8475
text-embedding-3-small (orig)	1536	0.8498
text-embedding-3-large (orig)	3072	0.8529
llama-3 (orig)	4096	0.8714
text-embedding-3-small (compact)	1024	0.8705
text-embedding-3-large (compact)	1024	0.8655
llama-3 (compact)	1024	0.8766

The best performer: llama-3 (compacted) at 0.8766.

That is a ~3.5 percentage point lift over one-hot encoding.

Statistical testing revealed that improvements were not uniformly significant — except notably for compacted text-embedding-3-large.

An interesting structural observation emerges:

Compacted embeddings often outperform original high-dimensional ones.

Why?

Because the GraphSAGE architecture outputs 1024-dimensional vectors. High-dimensional label spaces (3072–4096) may contain semantic richness the model cannot fully align with. Compression reduces noise while preserving structure.

In other words: the geometry must match the learner.

Implications — Encoding as Infrastructure

This paper is not about marginal F1-score gains.

It reframes encoding as part of the model’s epistemology.

1. AI Systems Become Semantically Sensitive

Using LLM embeddings embeds external knowledge into the training target. Even a relatively small GNN inherits semantic structure from trillion-token corpora.

This is knowledge transfer without fine-tuning the LLM itself.

2. Model Size vs. Embedding Richness

The study suggests that to fully leverage high-dimensional embeddings, downstream models may need increased capacity. There is an architectural co-evolution at play.

3. Practical Feasibility

Practitioners can adopt this without training foundation models. They only need:

Access to pretrained embeddings
Modified loss functions
Appropriate dimensional alignment

Low overhead. Structural impact.

4. Toward Multimodal Semantic Fusion

Future extensions could merge:

Text-based LLM embeddings
3D geometry
Point clouds
Sensor data

Embedding space becomes the unifying semantic layer.

For AECO firms pursuing AI-driven decision support, this matters. Classification errors at the subtype level cascade into cost estimation, scheduling, safety compliance, and digital twin reliability.

Encoding quality becomes governance quality.

Conclusion — From Tokens to Meaning

One-hot encoding treats classes as administrative categories.

LLM encoding treats them as concepts.

This paper demonstrates that even modest graph neural networks benefit when their label space reflects semantic structure rather than arbitrary orthogonality.

The improvement is incremental.

The implication is architectural.

If AI is to operate reliably in domain-specific environments like construction, the representation of meaning cannot remain an afterthought.

Encoding is not preprocessing.

It is ontology engineering disguised as vector math.

And that is where the next quiet wave of applied AI performance gains may emerge.

Cognaptus: Automate the Present, Incubate the Future.

Opening — Why This Matters Now#

Background — The Blind Spot in Supervised Learning#

Method — Replacing Classification with Semantic Projection#

Findings — Small Encoding Change, Measurable Shift#

Implications — Encoding as Infrastructure#

1. AI Systems Become Semantically Sensitive#

2. Model Size vs. Embedding Richness#

3. Practical Feasibility#

4. Toward Multimodal Semantic Fusion#

Conclusion — From Tokens to Meaning#