Graphing the Invisible: How Community Detection Makes AI Explanations Human-Scale
Auditors like lists. Models, inconveniently, do not behave like lists.
A credit model may tell you that income mattered, education mattered, job type mattered, age mattered, and postcode-adjacent variables mattered. A fraud model may produce the same kind of feature ranking, only with device fingerprints and transaction timings instead of employment history. The dashboard looks satisfyingly crisp: bars, scores, explanations, probably a tasteful shade of corporate blue. Then the real question arrives: which of these variables are acting together?
That is where ordinary feature attribution begins to look thin. A SHAP or LIME explanation can identify influential features for an individual prediction, but it often leaves governance teams with a flat inventory rather than a map. The paper “Community Detection on Model Explanation Graphs for Explainable AI” proposes a framework called Modules of Influence (MoI), which tries to build that map.1 Its central move is simple but powerful: convert many per-instance feature attributions into a graph of feature co-influence, then use community detection to identify groups of features that repeatedly act together.
The paper’s useful contribution is not that “graphs are cool”. Graphs have been cool since well before every dashboard became a nervous system diagram. The contribution is a workflow: local explanations become feature-feature affinities; affinities become sparse explanation graphs; graphs become modules; modules become units for auditing stability, redundancy, synergy, and bias exposure.
That workflow matters because businesses do not govern features one at a time. They govern product risk, compliance exposure, model drift, operational cost, and reputational fragility. Those things usually live in clusters.
The mechanism: from explanation fragments to a feature graph
MoI begins with a trained model and an attribution matrix. Each row corresponds to an instance; each column corresponds to a feature; each cell records how much that feature contributed to that instance’s prediction according to a chosen explainer.
The paper denotes this attribution matrix as $\Phi \in \mathbb{R}^{n \times d}$, where $n$ is the number of instances and $d$ is the number of features. For each instance $s$, the attribution vector tells us how the model’s prediction was locally explained. That part is familiar. The less familiar step is asking whether two columns of this attribution matrix tend to move together.
If feature $i$ and feature $j$ are often important in the same cases, or if their signed contributions tend to rise and fall together, MoI treats them as connected in an explanation graph. The nodes are features. The edges are co-influence weights.
This is the paper’s main reframing. It does not cluster raw variables directly. It clusters model-explanation behaviour.
That distinction is not academic decoration. Two features may be correlated in the dataset but not jointly used by the model. Conversely, two features may not look like obvious raw-data twins but may become linked inside a model’s decision logic. MoI therefore sits between ordinary feature engineering and ordinary explainability. It asks not merely “which variables exist together?” but “which variables tend to matter together for this trained model?”
The pipeline can be summarised as:
| Stage | Technical action | Operational meaning |
|---|---|---|
| Local attribution | Compute per-instance feature attributions using SHAP, LIME, Integrated Gradients, or another explainer | Convert predictions into feature-level contribution records |
| Co-influence graph | Build feature-feature edge weights from attribution columns | Identify features that repeatedly co-activate, compensate, or interact |
| Sparsification | Keep the strongest or most reliable edges, often with mutual-$k$ nearest neighbours | Avoid turning the graph into interpretability spaghetti, a dish nobody ordered |
| Community detection | Run Leiden, Louvain, Infomap, or stochastic block models | Find modules of features that behave as groups |
| Module auditing | Compute module-level metrics such as stability, redundancy, synergy, and bias exposure | Turn explanation structure into governance actions |
The mechanism-first reading matters because the paper is not best understood as one benchmark result. Its real object is an auditing architecture.
Edge construction is where the judgement enters
The paper is careful about a point that many applied explainability workflows quietly bury: the graph depends on how “co-influence” is defined.
MoI supports several edge rules. Magnitude-cosine similarity asks whether two features tend to be important at the same time, ignoring sign. Signed correlation asks whether their attribution directions move together or against each other. Mutual information and HSIC-style dependence measures can capture nonlinear relationships but are noisier and more computationally demanding. Co-exceedance and Jaccard-style measures can focus on cases where features cross a high-attribution threshold.
These choices are not cosmetic. They encode different auditing questions.
| Edge rule | What it tends to reveal | Business use | Boundary |
|---|---|---|---|
| Magnitude co-activation | Features that often matter together, regardless of positive or negative sign | Grouping operational drivers of decisions | Can hide whether features push in opposite directions |
| Signed correlation | Features whose contributions align or compensate | Detecting substitution, antagonism, or compensatory model logic | Sensitive to sign conventions and attribution noise |
| MI / HSIC | Nonlinear dependence between attribution patterns | Finding less obvious module structure | Requires shrinkage or pre-screening to avoid fragmented noise |
| Co-exceedance / Jaccard | Features jointly prominent in high-attribution cases | Auditing high-impact prediction regimes | Threshold choice can dominate the result |
The paper’s default practical starting point is magnitude-cosine on absolute attributions, robust column scaling, mutual-$k$ sparsification, degree normalisation, and stability-based selection of hyperparameters. This is sensible because magnitude co-activation answers the first business question most teams actually have: “Which feature groups does the model rely on together?”
Signed analysis still matters, but it is a second pass. A module with internal positive and negative attributions can cancel itself in aggregate. If a governance team only looks at summed module attribution, it may miss the fact that one part of the module is pushing a decision upward while another is dragging it down. The paper explicitly recommends reporting both signed and magnitude views. This is less glamorous than a single clean score, but clean scores are where nuance goes to die.
Community detection gives explanations a middle scale
Most explainability workflows oscillate between two bad scales.
At the small scale, they show individual features. This is precise but cognitively brittle. Humans can inspect ten features, perhaps twenty if they have coffee and low expectations. At the large scale, they talk about the model as a whole. This is manageable but too blunt for intervention. “The model is biased” is not an operational diagnosis; it is a meeting invitation.
MoI targets the middle scale: modules of features.
The paper calls these Modules of Influence because they sit above isolated features and below the full model. A module might contain income-related variables, employment variables, educational variables, clinical markers, device-risk indicators, or proxy-rich socio-economic fields. The point is not that the module names are magically discovered as human concepts. The point is that the graph can surface coherent feature groups that analysts can inspect, label, compare, ablate, and monitor.
This is also why community detection is a better fit than a simple top-feature ranking. Rankings are linear. Model behaviour is often modular. If a top-ten list contains three income proxies, two employment proxies, two education variables, and three unrelated noise magnets, the list does not tell you that the first seven may form a governance object. A module graph can.
The paper compares MoI against baselines such as clustering raw feature correlations, clustering attribution columns, SHAP interaction graphs, PCA/ICA groupings, and optional graphical-model approaches. The important comparison is not merely “which method scores better?” It is “which method groups features according to the model’s explanatory behaviour rather than only the dataset’s surface geometry?”
On the reported synthetic recovery task, MoI performs best among the listed methods:
| Method | Modularity $Q$ | Conductance ↓ | ARI on synthetic modules | NMI on synthetic modules |
|---|---|---|---|---|
| MoI, cosine + Leiden | 0.46 ± 0.03 | 0.22 ± 0.02 | 0.78 ± 0.06 | 0.71 ± 0.05 |
| SHAP interaction graph | 0.41 ± 0.04 | 0.25 ± 0.03 | 0.69 ± 0.08 | 0.63 ± 0.05 |
| Raw-feature correlation | 0.36 ± 0.05 | 0.28 ± 0.03 | 0.52 ± 0.10 | 0.49 ± 0.07 |
| PCA/ICA groupings | 0.31 ± 0.06 | 0.32 ± 0.03 | 0.44 ± 0.09 | 0.42 ± 0.07 |
The magnitude is meaningful, but not miraculous. MoI improves recovery of planted modules relative to raw correlation and PCA/ICA grouping. SHAP interaction graphs are closer, which makes sense: they also come from model explanations rather than raw covariates. The paper’s evidence therefore supports a moderate claim: explanation-derived graph structure can recover model-relevant modules better than generic feature grouping. It does not support the cartoon claim that community detection suddenly makes black boxes transparent. Please do not put that on a slide.
The evidence is strongest when treated as an audit workflow
The paper’s results are organised around four themes: structure, bias localisation, compression, and stability. These are not four separate theses. They are four tests of whether the module abstraction is useful.
| Result area | Likely purpose | What it supports | What it does not prove |
|---|---|---|---|
| Synthetic module recovery | Main evidence for structural validity | MoI can recover planted feature groups better than several baselines | Real-world modules are not guaranteed to be ground-truth mechanisms |
| Bias exposure analysis | Main evidence for governance usefulness | Disparities can concentrate in high-BEI modules that become intervention targets | Module-level disparity is not automatically causal discrimination |
| Compression using module features | Evidence for parsimony | Module aggregation can preserve most predictive signal while reducing dimensionality | Compression will not work for every entangled or low-signal model |
| Stability tests | Robustness and sensitivity evidence | Edge choices, sparsification, and perturbations affect reliability | A stable module is not necessarily a correct causal module |
That distinction is important. The paper is at its best when read as a disciplined proposal for where to look. It narrows the audit space. It gives model-risk teams a way to move from “feature importance soup” to “these three modules deserve attention”.
In the fairness setting, the paper introduces a Bias Exposure Index, or BEI, to rank modules by group-conditioned differences in module influence. It reports that disparities often concentrate in a small number of high-BEI modules, frequently containing known proxies or socio-economic attributes. The proposed interventions include attenuating high-BEI modules, regularising their attributions, reweighting training, or targeting data augmentation.
This is practically interesting because it reflects how bias usually survives compliance theatre. Removing an explicit sensitive variable does not remove the surrounding proxy ecology. A postcode-like variable, education field, employment category, and transaction pattern may collectively encode a protected attribute without any one field looking guilty enough to be fired. A module-level view can expose the cluster. The model, alas, has no respect for your column deletion ceremony.
The paper’s fairness dashboard is therefore more valuable as a diagnostic interface than as a mitigation recipe. It can rank modules, show confidence intervals, compare pre/post-intervention disparity, and annotate accuracy changes. But any real deployment would still require domain review, legal interpretation, and interventional validation. MoI can tell you which part of the model’s explanation graph looks suspicious. It cannot decide whether that suspicion is legally or ethically dispositive.
The compression result is quietly important
One of the more business-relevant findings is not the flashiest. The paper reports that module-aggregated features can preserve predictive performance while reducing dimensionality. In one reported parsimony table, replacing raw attribution features with module features reduces dimensionality from $d=128$ to $K=18$, reduces parameters from 45,200 to 9,030, and lowers inference time from 2.8 ms to 1.3 ms, while AUROC changes from 0.912 to 0.909.
| Representation | Dimension | AUROC | Parameters | Inference time |
|---|---|---|---|---|
| Raw attributions $\Phi$ | $d=128$ | 0.912 | 45,200 | 2.8 ms |
| Module features $\Psi$ | $K=18$ | 0.909 | 9,030 | 1.3 ms |
The article-worthy point is not the specific millisecond saving. In most enterprise systems, the larger cost is not this individual inference call. The bigger value is auditability.
A model explained through 18 module features is easier to inspect than one explained through 128 individual attribution channels. It is easier to compare across cohorts. It is easier to monitor for drift. It is easier to assign to domain owners. The compression is not just computational; it is organisational.
That said, compression is not universally benign. The paper notes that high-redundancy modules are more compressible, while low-redundancy modules may be interaction-heavy and less suitable for pruning or aggregation. This is exactly the kind of distinction governance teams need. A redundant module may indicate interchangeable proxies; an interaction-heavy module may indicate brittle behaviour that should be studied rather than compressed away.
Stability is the audit gate, not an appendix chore
The most mature part of the paper is its insistence on stability.
Community detection can be temperamental. Change the attribution background set, the random seed, the sparsification threshold, the edge definition, or the sample slice, and the modules may shift. MoI addresses this with a Module Stability Index, based on bootstrap perturbations and matching modules across runs using overlap measures such as Jaccard or IoU. It also recommends consensus matrices, stability curves, and hyperparameter sweeps.
This matters because module-level interventions are only credible if the modules themselves are not hallucinating structure from noise. If a high-BEI module disappears when the SHAP background changes, it is not ready to become a compliance control. It is a lead, not a finding.
The paper reports that magnitude-cosine edges generally produce higher stability than raw-correlation edges, while MI/HSIC can detect nonlinear ties but require stronger shrinkage to avoid fragmentation. Mutual-$k$ sparsification with $k$ in the range of 10 to 30 is presented as a practical balance between connectivity and resolution. Degree normalisation helps reduce hub dominance.
That is implementation detail with strategic consequences. A governance team should not treat MoI as a push-button explainer. The graph-building configuration is part of the result. In a model audit, the edge rule, sparsification method, community algorithm, seed strategy, and stability diagnostics should be documented alongside the module findings.
A stable but modest module analysis is more useful than an elegant but unstable graph. The latter belongs in a demo video, where all fragile things go to be loved briefly.
The causal misconception needs to be killed early
The most likely misreading of this paper is also the most dangerous one: assuming that discovered modules are causal mechanisms.
They are not.
MoI builds graphs from attributions. Attributions are model explanations, not interventions in the world. A module may represent a group of features that jointly influence the model’s prediction. That does not mean the module causes the real-world outcome. It may reflect confounding, measurement bias, historical selection effects, or merely the model’s learned shortcut.
The paper is explicit about this boundary. It frames MoI as hypothesis-generating and discusses causal follow-up through counterfactual reasoning, structural causal models, module-level causal estimands, path-specific effects, and invariance checks. In practical terms, MoI can suggest that an income-employment-education module may mediate a disparity. To claim that it causally mediates the disparity, an organisation would need assumptions, interventions, identification strategy, or controlled validation.
A useful replacement belief is this:
MoI does not discover causal mechanisms. It discovers explanation-derived modules that make causal questions more targeted.
That is still valuable. Causal analysis often fails because the search space is too large. MoI narrows the search space by identifying candidate modules for ablation, counterfactual generation, conditional baselines, or environment-based invariance testing. It turns “we should investigate bias” into “we should test whether this high-BEI module remains influential after realistic conditional intervention and whether its effect survives across environments.”
That is a much better sentence. Longer, yes. But governance is not paid by the syllable.
How a business team would actually use MoI
The practical pathway is clearest for organisations already using tabular models with accessible attributions: credit scoring, insurance pricing, fraud detection, customer risk, clinical risk, churn prediction, and operational triage. These are domains where features are numerous, proxy structures matter, and audit teams need explanations that survive contact with policy.
A realistic MoI workflow would look like this:
-
Run standard attribution analysis. Generate SHAP, LIME, or Integrated Gradients explanations across a representative evaluation set, with background/reference settings documented.
-
Construct multiple explanation graphs. Build magnitude and signed co-influence graphs, using a default such as magnitude-cosine plus mutual-$k$ sparsification, then compare alternatives.
-
Detect and label modules. Apply Leiden, Louvain, Infomap, or another community method. Analysts then inspect top features in each module and assign domain labels cautiously.
-
Score modules. Compute stability, redundancy, synergy, ablation drop, and bias exposure. Rank modules by risk-relevant criteria rather than by visual appeal.
-
Stress-test the partition. Vary seeds, attribution backgrounds, edge rules, perturbation strength, and data slices. Reject modules that are not stable enough for governance use.
-
Intervene only after realism checks. Use conditional baselines or soft attenuation rather than naive feature masking. Hard ablation can create out-of-distribution records, and out-of-distribution records are where models go to improvise jazz.
-
Report modules as audit objects. Include module composition, edge settings, community algorithm, confidence intervals, pre/post metrics, and residual proxy risk.
The ROI case is therefore not “we get prettier explanations”. It is cheaper diagnosis, more targeted mitigation, and more repeatable audit evidence. A module-level audit can help identify where to focus data collection, which proxy clusters need review, which feature groups can be compressed, and which model behaviours are too unstable to trust.
Where the method is strongest, and where it is brittle
MoI is strongest under four conditions.
First, the model operates on structured tabular data. The paper explicitly focuses on tabular settings with accessible per-instance attributions and binary or real-valued predictions. This does not mean the idea cannot extend elsewhere, but the evidence and defaults belong to tabular modelling.
Second, the attribution method is reliable enough for the task. If the underlying explanations are unstable or poorly calibrated, the graph inherits that instability. Community detection cannot launder bad attributions into good governance. It can only organise them more dramatically.
Third, the feature space has meaningful meso-scale structure. If features are weakly informative, highly entangled, or dominated by diffuse interactions, the graph may over-partition or merge modules in ways that are hard to interpret. The paper notes resolution limits and recommends stability criteria to reject fragile partitions.
Fourth, the organisation is willing to treat module discovery as an audit workflow, not a one-click truth machine. MoI requires configuration discipline: explainer settings, background choices, edge definitions, sparsification, community algorithm, seeds, and sensitivity analysis all matter.
The brittleness appears when these conditions fail. Signed graphs with strong antagonism can create ambiguous modules unless negative edges are treated explicitly. Hard ablations can produce unrealistic inputs unless conditional baselines are used. Module aggregation can hide within-module cancellation. Group-specific graphs can reflect sampling bias. Public release of fine-grained attribution artefacts can create privacy leakage risks.
None of these invalidate the method. They define the operating envelope.
The reporting layer is more than cosmetics
The paper devotes considerable attention to visualisation and reporting: module graphs, reordered heatmaps, Sankey diagrams, fairness dashboards, stability curves, consensus matrices, and module summary tables. This might look like presentation polish. It is actually part of the method.
A module graph helps analysts see the topology of feature co-influence. A reordered heatmap reveals block structure. A Sankey diagram can show feature-to-module-to-output contribution flows across cohorts. A fairness dashboard ranks modules by bias exposure and shows disparity before and after interventions. Stability curves show whether a partition survives perturbation.
For enterprise use, this reporting layer should be treated as audit infrastructure. A model-risk committee does not need a 400-node hairball. It needs stable module labels, comparable scales, uncertainty intervals, documented settings, and a trail from diagnosis to action.
The paper’s proposed reporting template includes module size, average degree, redundancy, bias exposure, mean attribution, top features, and ablation drop. That is close to what a serious governance artefact should contain. I would add owner, policy relevance, data provenance notes, mitigation status, and residual risk. The model may be statistical, but the accountability system is still painfully human.
What Cognaptus should infer, and what it should not
The paper directly shows a framework for converting per-instance attributions into co-influence graphs, extracting modules, and auditing those modules with stability, redundancy, synergy, and bias exposure metrics. It reports improved synthetic module recovery against several baselines, module-level bias localisation, compression with small reported AUROC change in an example table, and stability sensitivity across graph-construction choices.
The business inference is that module-level explanations can make model governance more operational. Instead of chasing isolated top features, teams can inspect clusters that behave together. This supports targeted mitigation, proxy analysis, model simplification, drift monitoring, and clearer audit reporting.
The uncertainty boundary is equally clear. MoI depends on the attribution method, the trained model, the data slice, the graph configuration, and the community algorithm. Its modules are explanation-derived associations, not causal mechanisms. Its fairness findings require legal and domain interpretation. Its interventions need realistic conditional baselines. Its visualisations can clarify, but also seduce. Graphs are very good at looking authoritative. So are marble lobbies.
The deeper shift: explanations become objects of governance
The most interesting idea in the paper is not the specific use of Leiden, mutual-$k$, or bias exposure scoring. It is the shift from explaining predictions to governing explanation structure.
A feature attribution list answers: “What mattered here?”
A module graph asks: “What tends to matter together?”
A stability analysis asks: “Does that grouping persist when conditions change?”
A bias exposure score asks: “Which group of features carries disparity?”
An ablation test asks: “What happens if we reduce reliance on this part of the explanation structure?”
That progression is the real business value. It turns XAI from a customer-service script into a diagnostic system. Not a perfect one, not a causal oracle, and definitely not a replacement for human judgement. But a better unit of analysis than the lonely feature bar.
MoI’s core insight is that model explanations need a middle scale. Individual features are too granular. Whole-model summaries are too blunt. Modules are where many practical governance questions actually live.
And once you can graph that middle scale, you can finally ask the model a more grown-up question: not “which feature mattered?”, but “which part of your decision machinery keeps showing up?”
That is a question worth automating carefully.
Cognaptus: Automate the Present, Incubate the Future.
-
Ehsan Moradi, “Community Detection on Model Explanation Graphs for Explainable AI,” arXiv:2510.27655, 2025, https://arxiv.org/abs/2510.27655. ↩︎