Graph Theory in Stereo: When Causality Meets Correlation in Categorical Space

Graphs look clean until they start carrying probability.

A Bayesian network says: these variables have directed relationships; each node comes with a conditional distribution. A Markov network says: these variables interact symmetrically; each clique carries a potential. Both are old tools. Both are useful. Both are also a little too easy to treat as pictures with numbers attached, which is how software systems eventually grow a nice coat of ambiguity.

The paper “Bayesian Networks, Markov Networks, Moralisation, Triangulation: a Categorical Perspective” by Antonio Lorenzin and Fabio Zanasi is not trying to make graphical models faster, larger, or more fashionable.1 It does something less noisy and more foundational: it asks which parts of probabilistic graphical model transformations are genuinely graph syntax, and which parts secretly depend on probabilistic semantics.

That distinction is the whole article. Moralisation, the standard move from Bayesian networks to Markov networks, turns out to be syntactic in a strong categorical sense. Triangulation, the move in the other direction, does not enjoy the same innocence. It can be split into a syntactic chordalisation step and a semantic variable-elimination step. Translation, apparently, is never neutral. Graphs are no exception.

The paper’s mechanism is a compiler pipeline, not a prettier diagram

The easiest way to misread this paper is to see “category theory” and assume the contribution is aesthetic abstraction. That would be comforting. Also wrong.

The paper’s argument is closer to a compiler design problem. A probabilistic graphical model has a surface language, the graph, and an interpretation, the probability structure. Existing graphical model practice often moves between these two levels informally. A directed graph becomes undirected. An undirected graph becomes chordal. Factors are rearranged. Conditional distributions are recovered. Everyone nods because the algorithm works.

The authors ask for stricter accounting. What is the syntax? What is the semantics? Which transformation only rewrites the syntax? Which transformation asks the semantics for something extra?

The resulting pipeline can be read as follows:

Stage Object being handled Categorical role What changes What does not magically appear
Bayesian network Directed acyclic graph with conditional distributions A CD-functor from directed syntax into stochastic semantics Directed syntax is interpreted as stochastic maps Extra independencies
Markov network Undirected graph with clique factors A hypergraph functor from undirected syntax into nonnegative matrix semantics Clique syntax is interpreted as potentials Normalised conditionals
Moralisation Bayesian network to Markov network Functor defined by precomposition on syntax Directed parent structure becomes undirected clique structure New probabilistic computation
Syntactic triangulation Markov network to chordal network Functor into chordal-network syntax Undirected graph becomes ordered chordal syntax Bayesian semantics
Variable elimination Chordal network to Bayesian network Semantic functor using conditionals and normalisation Potentials become conditional distributions A purely syntactic inverse

That table is the paper in business English. Not the full mathematics, obviously. Nobody should put “business English” near a proof and expect peace. But it captures the mechanism: the paper is not merely classifying two graph transformations. It is separating the transformations into layers that an implementation, proof system, or future probabilistic-programming compiler could audit.

Two kinds of graph need two kinds of syntax

Bayesian networks and Markov networks are not just two drawing styles. Their syntax has different operational behavior.

For a Bayesian network over a DAG $G$, the usual factorisation is:

$$ \omega(V_G)=\prod_{v\in V_G} f_v(v \mid Pa(v)). $$

Each node is generated from its parents. Direction matters. Copying and deleting variables matter because one variable can feed multiple downstream conditional distributions, or feed none. This is why the paper uses copy-delete categories, or CD-categories, for Bayesian-network syntax. In the semantic category FinStoch, morphisms are stochastic matrices. A Bayesian network becomes a structure-preserving functor from the freely generated CD-category CDSyn_G into FinStoch.

This part builds on prior categorical work, but it is essential for the paper’s pipeline. It lets a Bayesian network be treated as a model of a graph-generated syntax, not merely as a graph plus a list of conditional probability tables.

For a Markov network, the story changes. The graph is undirected, and the distribution is assembled from clique factors. In the paper’s setting, those factors live naturally in nonnegative matrices:

$$ Q(V_H)=\prod_{C\in Cl(H)} \phi_C(C), $$

with normalisation used to obtain a probability distribution when needed. This is not the same semantic posture as a Bayesian network. Clique potentials are not automatically conditional distributions. They are weights. They can be multiplied, compared, and normalised, but they do not arrive as stochastic maps with columns summing to one.

So the paper models Markov-network syntax using hypergraph categories, whose Frobenius structure supports the “wire-bending” behavior of undirected models. Markov networks over an ordered undirected graph $H$ correspond to hypergraph functors:

$$ HSyn_H \to Mat(\mathbb{R}_{\ge 0}). $$

This is one of the paper’s central contributions: a parallel functorial semantics for Markov networks. It puts Bayesian and Markov networks into comparable categorical form without pretending they have the same syntax.

The technical point is not “category theory can describe everything,” the slogan that has launched a thousand unread PDFs. The point is sharper: if the syntax is chosen correctly, transformation rules can be expressed as functorial precomposition rather than as prose instructions wrapped around graph surgery.

Moralisation is the clean direction because it only forgets direction

Moralisation converts a Bayesian network into a Markov network. In ordinary graphical-model language, it connects parents of the same child and drops edge directions. If $B$ and $E$ are both parents of $A$, moralisation adds an undirected edge between $B$ and $E$. The family has been made “moral,” in the old terminology. Mathematicians do have jokes; they simply hide them inside definitions.

The paper’s categorical treatment makes this operation more precise. Given a Bayesian network object $(\omega, G, \tau)$ in the category BN, moralisation maps it to:

$$ (\omega, Mor(G), \tau) $$

in the category MN.

The distribution $\omega$ and variable assignment $\tau$ remain the same. What changes is the graph syntax through which the same probabilistic content is seen. The crucial construction is a hypergraph functor:

$$ m: HSyn_{Mor(G)} \to HSyn_G. $$

Informally, the moralised clique corresponding to a node and its parents is mapped back to the original directed generator. This is why moralisation can be expressed by precomposition. The semantics does not need to be recalculated; the syntax is re-routed.

This matters because it corrects a common misconception. Moralisation is often taught as a graph-editing step used on the way to exact inference. That is true, but shallow. In this paper, moralisation is a disciplined syntactic translation: it changes how the graph exposes dependency structure, but it does not require new semantic assumptions.

The distinction is small until you build systems. Then it is everything.

If a model compiler transforms directed uncertainty models into undirected ones, it matters whether the transformation is a verified structural rewrite or a probability-dependent inference step. Structural rewrites can be cached, audited, versioned, and tested without numerical data. Semantic steps need assumptions, normalisation, and failure handling. Mixing the two is how elegant research code becomes operational folklore.

Triangulation is not moralisation backwards

The opposite direction is where the paper becomes more interesting.

Triangulation is often described as a move from a Markov network toward a structure that supports Bayesian-style or exact-inference operations. But the paper refuses to treat this as simply “reverse moralisation.” Good. Reverse buttons are usually where assumptions go to hide.

The authors split triangulation into two pieces:

  1. Tr_C(-): MN → CN, a syntactic functor from Markov networks to chordal networks.
  2. VE(-): CN → BN, a semantic functor from chordal networks to Bayesian networks.

The first step is graph-level. It takes an ordered undirected graph and produces an ordered chordal directed structure. The second step is probabilistic. It needs a semantic environment with total conditionals and a normalisation cospan. In the paper’s main setting, this is handled using the relationship between nonnegative matrices, projective stochastic structure, and stochastic matrices.

Put less politely: you can triangulate the graph syntax without asking probability questions, but you cannot recover Bayesian conditional structure for free.

This is where variable elimination enters. The paper proves a Functorial Variable Elimination result: under the right semantic assumptions, a chordal-network factorisation can be turned into a Bayesian-network factorisation. The proof mirrors variable elimination over ordered chordal graphs. It eliminates variables in reverse order, absorbs factors, and produces conditional structure.

The authors are careful about scope. They provide a counterexample showing why the chordal restriction is meaningful. Without the ordered chordal structure, a hypergraph functor can express dependencies that cannot be captured by the desired Bayesian factorisation. So the issue is not cosmetic. The graph shape controls whether the semantic conversion is legitimate.

For readers who work with AI systems rather than category theory, this is the first major operational lesson:

A graph transformation may be syntactically valid while still being semantically underlicensed.

That sentence should be printed on more system architecture diagrams.

The paper’s evidence is mathematical construction, not benchmark performance

There are no accuracy tables here. No latency charts. No “our method beats 17 baselines.” The evidence is mathematical: bijections, functorial constructions, factorisations, commutative diagrams, and counterexamples.

That makes the paper easier to mis-sell and harder to use responsibly. So the evidence should be read by purpose:

Paper component Likely purpose What it supports What it does not prove
Bayesian networks as CD-functors Foundation / prior framework integration Directed PGMs can be treated as functorial semantics from graph syntax to stochastic semantics That category theory improves runtime inference
Markov networks as hypergraph functors Main conceptual contribution Undirected PGMs can be represented in a parallel categorical style using clique syntax and nonnegative matrix semantics That all PGM variants are already covered
Irredundant network construction Technical normalisation of the setting The categories BN and MN focus on relevant factorisations and conditional-independence structure That redundancy is solved for every practical modelling pipeline
Moralisation as Mor(-): BN → MN Main transformation result Moralisation can be defined functorially and syntactically by precomposition That information is preserved without loss of conditional independencies
Syntactic triangulation Tr_C(-): MN → CN Decomposition result The graph-level part of triangulation can be separated from semantic conversion That Markov-to-Bayesian conversion is purely graph-theoretic
Variable elimination VE(-): CN → BN Semantic conversion result Under chordality and semantic assumptions, a Bayesian factorisation can be recovered That every undirected model admits a simple Bayesian translation
No adjunction between Mor(-) and Tr(-) Boundary result Moralisation and triangulation are not clean categorical inverses That the transformations are useless; only that they are asymmetric

This evidence pattern is worth respecting. The paper is not offering a production algorithm. It is offering a typed map of where production algorithms place their assumptions.

That is still useful. In fact, it may be more useful than another benchmark in domains where “why did this inference step do that?” is not a philosophical question but a compliance meeting with bad coffee.

The no-adjunction result is the quiet warning label

A tempting hope would be that moralisation and triangulation form a neat pair. Move from Bayesian to Markov. Move back from Markov to Bayesian. Perhaps the two transformations are adjoint. Perhaps the universe is kind.

The paper shows it is not.

The reason is already visible at the graph level. Moralisation and triangulation generally add edges. Added edges mean fewer encoded conditional independencies. Once that information has been erased, the reverse transformation cannot naturally reconstruct it without adding information not present in the source.

The paper establishes natural transformations:

$$ Tr(Mor(-)) \to id_{BN} $$

and

$$ Mor(Tr(-)) \to id_{MN}, $$

and shows the associated endofunctors are idempotent. But it also proves that Mor(-) and Tr(-) do not form an adjunction. The obstruction is not an exotic categorical inconvenience. It is the ordinary fact that graph transformations can destroy independence information.

For business readers, this is the important part: representation changes are not reversible merely because the diagrams look symmetric.

That matters in model governance. A system may convert a causal model into an undirected representation for inference. Later, someone may ask for a causal explanation. The converted model may still produce probabilities, but it may no longer carry the original independence or causal structure. Calling the result “the same model in another format” is too casual. Sometimes the translation keeps the distribution while flattening the explanation.

Where the business relevance actually lives

This paper will not reduce inference cost next quarter. It will not make a customer-support chatbot more charming. It will not rescue a broken analytics dashboard, although at this point nothing should be expected to.

Its relevance is more infrastructural.

Modern AI systems increasingly combine learned models, probabilistic reasoning, retrieval, business rules, monitoring layers, and human feedback. The models are often not a single neural network but a graph of decisions, signals, dependencies, and transformations. When those systems represent uncertainty, they need ways to move between structures: directed causal assumptions, undirected correlations, factor graphs, hidden-state models, and inference-ready forms.

The paper’s syntax/semantics split suggests a useful discipline for such systems:

Operational question Paper-informed interpretation Practical value
Is this transformation only changing structure? Treat it as syntax-level precomposition if no probabilistic assumptions are invoked Easier audit, caching, testing, and documentation
Does this step require normalisation or conditioning? Treat it as semantic, not merely graphical Explicit assumption tracking and failure handling
Can we recover the original independencies after conversion? Not generally; added edges represent lost independence information Prevents false equivalence between model formats
Can we swap semantic backends? Potentially, if syntax and semantics are cleanly separated Better design for probabilistic programming and model compilation
Can categorical diagrams improve governance? Only if they clarify transformation boundaries, not if used as ornamental formalism Better assurance language for complex uncertainty systems

The safest business inference is this: the paper gives a vocabulary for auditable uncertainty compilation.

That is not a small thing. In many organisations, uncertainty models still live as a mixture of diagrams, Python objects, spreadsheet-era assumptions, and one person named David who remembers why edge $X \to Y$ was added in 2019. A formal syntax/semantics split does not replace David. But it can reduce the number of Davids required to keep the system alive.

What Cognaptus infers, and what the paper directly shows

It is worth drawing the line cleanly.

The paper directly shows that Bayesian and Markov networks can be represented as functors from graph-generated syntax into appropriate semantic categories; that moralisation can be treated as a functor from BN to MN; that triangulation can be split through chordal networks as a syntactic step plus semantic variable elimination; and that moralisation and triangulation do not form an adjunction.

Cognaptus infers that this is relevant to future systems for probabilistic programming, model compilation, AI assurance, and uncertainty-aware workflow automation. The inference is reasonable because those systems need structured transformations between representations. But the paper does not implement a commercial compiler, test enterprise workloads, or demonstrate measurable ROI.

The uncertainty boundary is therefore clear:

  • Strong claim: the syntax/semantics separation is mathematically explicit and useful for reasoning about transformations.
  • Moderate claim: this separation can inform better software architecture for probabilistic and causal systems.
  • Weak claim: this will directly improve near-term enterprise AI products. That requires tooling, integration, and user-facing validation not present in the paper.

This boundary should not reduce the paper’s value. It should protect it from being inflated into nonsense. Foundational work is allowed to be foundational. Not every paper needs to arrive dressed as a SaaS feature.

The junction-tree aside explains what the paper is not doing

The paper briefly discusses the junction tree algorithm, and this is a useful boundary marker. Chordal graphs connect naturally to cluster graphs and junction trees. The authors note that junction-tree message passing would require additional machinery, including arithmetic division, which their current categorical framework does not provide.

That remark matters because it prevents overreading the result. The paper clarifies the categorical structure around moralisation, triangulation, chordalisation, and variable elimination. It does not give a full categorical account of all message-passing algorithms.

This is a disciplined limitation. The authors are not saying junction trees are irrelevant. They are saying that a proper treatment of message passing would need semantic operations beyond the present framework. In business language: the architecture diagram is valuable, but it is not yet the runtime.

The article-level takeaway: transformations need receipts

The paper’s most useful idea is not “Bayesian networks are functors” or “Markov networks are hypergraph functors,” although both are important. The most useful idea is that transformations between uncertainty representations should come with receipts.

A receipt says:

  • this part is syntax;
  • this part is semantics;
  • this part preserves the distribution;
  • this part may erase conditional independencies;
  • this part needs normalisation;
  • this part works only because the graph is chordal;
  • this part is not reversible.

That is the right standard for serious AI systems. Not because every business team should become fluent in CD-categories before lunch, but because the systems themselves increasingly behave like chains of representational transformations. When those chains affect risk scoring, medical triage, finance workflows, supply-chain decisions, or autonomous operations, “the graph was converted” is not enough of an explanation.

The paper gives a categorical account of that conversion machinery. It tells us why moralisation is the clean direction, why triangulation needs semantic help, why variable elimination belongs in the story, and why the two transformations should not be mistaken for inverse translations.

Graph theory in stereo, then, is not just a clever title. Directed causality and undirected correlation are two channels. The paper’s contribution is the mixing board: it shows which knobs change the wiring and which knobs change the sound.

Cognaptus: Automate the Present, Incubate the Future.


  1. Antonio Lorenzin and Fabio Zanasi, “Bayesian Networks, Markov Networks, Moralisation, Triangulation: a Categorical Perspective,” arXiv:2512.09908, 2025. https://arxiv.org/pdf/2512.09908 ↩︎