Audit is a boring word until the model says something expensive.
A credit model rejects an applicant. A visual inspection model flags a component. A traffic-sign classifier keeps its prediction under small pixel changes. The business question is not merely, “What did the model look at?” That is the demo-room version. The operational question is harder: which input features must remain fixed so that the model’s decision is guaranteed not to change under allowed perturbations?
That question belongs to formal explainability. It is less photogenic than a heatmap and far less forgiving than a dashboard tooltip. But in safety, compliance, and model assurance, “the red pixels look important” is not an explanation. It is a mood board with gradients.
The paper “FAME: Formal Abstract Minimal Explanation for Neural Networks” proposes a framework for computing formal abductive explanations for neural networks using abstract interpretation and LiRPA-based verification.1 Its central move is not to make exact verification magically cheap. It does something more useful: it separates fast, sound pruning from slower exact refinement. In plain business language, FAME tries to make formal explanations usable by first proving what can safely be ignored.
That distinction matters. The paper is not saying every modern neural network can now receive perfect minimal explanations at production speed. It is saying that the expensive part of formal explanation can be narrowed, structured, and accelerated enough to reach models where exact-only methods time out or run out of memory. Not exactly fairy dust. More like accounting discipline for proof search.
Formal explanations ask what must stay fixed
Most explainability methods estimate importance. They may rank pixels, words, variables, or regions. That is useful for inspection, but it is not the same as a guarantee.
FAME works in the tradition of abductive explanations, or AXps. For a model input $x$, a perturbation domain $\Omega(x)$, and a property $P$, an abductive explanation is a subset of features that is sufficient to guarantee the property. In classification, the property is usually that the predicted class remains on top under all allowed perturbations.
For example, suppose a classifier predicts class $c$. The robustness property can be written as:
$$ P(x’) \equiv \min_{i \ne c} { f_c(x’) - f_i(x’) } > 0 $$
The explanation question becomes:
Which features must be fixed to their original values so that every allowed perturbed input still satisfies $P$?
This is very different from asking which features look influential. A feature can appear visually salient and still be unnecessary for the formal guarantee. Another feature can look boring but be required to prevent a counterexample. Formal explanation is rude that way. It does not care whether the heatmap is pretty.
The smallest useful explanation is not simply any sufficient subset. The paper distinguishes between weak abductive explanations, minimal explanations, and minimum explanations. A minimal explanation means no single feature can be removed without breaking the guarantee. A minimum explanation is the smallest among all such explanations. Minimum is stronger, usually harder, and the sort of thing that makes solvers quietly leave the room.
FAME’s important editorial lesson is that formal does not have to mean exact minimum at every stage. The paper introduces abstract formal explanations: explanations certified under a sound over-approximation of the network. Because the abstraction is sound, the guarantee transfers to the concrete model. Because the abstraction is conservative, it may fail to certify some features that are actually irrelevant. That is the trade.
The old bottleneck is not just solver cost; it is traversal order
Existing formal explanation methods face two linked problems.
First, exact solvers are expensive. Neural-network verification can be encoded through SAT, SMT, MILP, or specialized tools such as Marabou. These methods are powerful, but they do not scale gracefully as networks and feature spaces grow.
Second, many explanation algorithms depend on a traversal order. They test features sequentially, or in structured batches based on an ordering. That creates a circular dependency: to choose a good order, one needs some prior sense of feature importance, but the whole purpose of explanation is to discover which features matter.
This is the paper’s real mechanism-level target. FAME is not merely replacing one solver with another. It tries to break the sequential bottleneck by finding groups of irrelevant features that can be freed together without relying on a preselected traversal order.
The word “freed” is doing important work here. In the paper’s terminology:
| Action | Meaning | Why it matters |
|---|---|---|
| Add a feature to the explanation | Treat it as necessary; keep it fixed | Helps preserve the decision guarantee |
| Free a feature | Treat it as irrelevant; allow it to vary | Shrinks the explanation |
| Free many features together | Certify a whole group as jointly irrelevant | Creates scalability, but is dangerous without a joint certificate |
Adding features can be parallelized more easily. If attacks or verification show that several features are individually necessary, they can be added together. Freeing is asymmetrical. Two features may each look irrelevant when freed alone, but jointly they may allow a counterexample. This is the quiet trap in formal XAI: individual safety does not imply group safety.
FAME’s response is the abstract batch certificate.
The abstract batch certificate is the core trick
A normal verifier often behaves like a binary oracle: the property holds, or it does not. That is not enough to safely free many features at once, because it does not expose the combined worst-case contribution of a candidate group.
FAME uses LiRPA, specifically linear relaxation-based bounds, to obtain conservative estimates of how feature changes can affect the classifier’s output margins. These bounds allow the method to compute a certificate for a candidate set of features $A$. If the certificate succeeds, the whole set can be freed simultaneously.
This matters because the method is not saying:
“Each feature in this group passed a single-feature test, so let us remove all of them.”
That would be unsound.
It is saying:
“The joint worst-case contribution of this group remains within the certified margin, so freeing the group preserves the property.”
That difference is the paper’s hinge. It turns feature elimination from a sequential guessing game into a batch certification problem.
The batch selection problem then resembles a multidimensional knapsack. The algorithm wants to free as many features as possible while staying within certificate constraints across the non-ground-truth classes. The exact knapsack formulation can be solved with MILP for moderate cases, but that would undermine the scalability goal. So FAME uses a greedy heuristic: select features whose normalized worst-case contribution is least likely to violate any constraint.
This is not glamorous. It is also exactly the kind of engineering compromise that makes research useful. The greedy method is not always optimal, but the experiments show it is close enough to the MILP solution while being much faster.
Recursive domain tightening is what makes the abstraction useful
A single abstract pass can be too conservative. Abstract interpretation over-approximates possible model outputs. The larger or looser the perturbation domain, the looser the bounds. Loose bounds make the verifier cautious, and cautious verifiers refuse to certify features that may in reality be irrelevant. A very formal way of saying: the method becomes scared of its own approximation.
FAME addresses this by refining the perturbation domain recursively.
After one batch of features is certified as free, the method restricts the domain using cardinality constraints. Instead of allowing arbitrary variation across all remaining dimensions, it considers domains where at most $m$ features vary in addition to the already freed set. This tighter domain gives LiRPA sharper bounds. Sharper bounds allow more features to be certified as irrelevant. Then the process repeats.
The mechanism looks like this:
Start with many candidate features
|
v
Use LiRPA bounds to certify a batch of free features
|
v
Shrink the perturbation domain with a cardinality constraint
|
v
Recompute tighter bounds
|
v
Certify more free features
|
v
Run singleton refinement and optional exact refinement
This is why the article should not be read as “FAME uses approximation instead of exact verification.” That would miss the point.
The better reading is:
FAME uses sound abstraction to make exact explanation search smaller, and it uses recursive domain tightening to make the abstraction less wasteful.
In formal methods, this is a very practical kind of intelligence. The system does not try to know everything at once. It narrows what needs to be known.
The experiments test three different claims, not one big applause line
The paper’s empirical section compares FAME with VERIX+, the prior state-of-the-art baseline for verified explanations of neural networks. The experiments use MNIST and GTSRB image classifiers, including fully connected and convolutional models. The reported metrics are explanation size and runtime, averaged over non-robust samples from 100 samples per dataset.
The evidence should be read in layers:
| Test or result | Likely purpose | What it supports | What it does not prove |
|---|---|---|---|
| Greedy vs. MILP in single-round abstract freeing | Ablation / implementation choice | The greedy heuristic is much faster while close to optimal MILP in explanation size | Greedy is not guaranteed globally optimal in every setting |
| Iterative refinement vs. single-round FAME | Mechanism validation | Recursive domain tightening can produce substantially smaller explanations | It does not remove all abstraction looseness |
| FAME-accelerated VERIX+ vs. VERIX+ alone | Main comparison with prior work | FAME can improve runtime and often explanation compactness before exact refinement | It does not mean exact refinement is unnecessary |
| ResNet-2B on CIFAR-10 appendix experiment | Scalability extension | FAME can produce sound abstract explanations where exact methods time out or exhaust memory | It is not evidence of fast real-time explanation for large production models |
This distinction matters because the paper’s strongest contribution is not a single benchmark number. It is the layered pipeline: batch certificate, greedy selection, recursive refinement, and optional exact refinement.
The numbers show a practical shift in the cost-quality trade-off
Table 1 in the paper is dense, but the story is clear enough.
For the four MNIST/GTSRB models, VERIX+ alone is compared with FAME variants. The full FAME-accelerated VERIX+ pipeline uses FAME to reduce the candidate set and then applies final refinement to achieve minimality.
A simplified view:
| Model | VERIX+ explanation size / time | FAME iterative greedy abstract explanation size / time | FAME-accelerated VERIX+ final explanation size / time |
|---|---|---|---|
| MNIST-FC | 280.16 / 13.87s | 225.14 / 8.78s | 224.41 / 13.72s |
| MNIST-CNN | 159.78 / 56.72s | 122.09 / 5.60s | 113.53 / 33.75s |
| GTSRB-FC | 313.42 / 56.18s | 332.74 / 5.26s | 332.66 / 9.26s |
| GTSRB-CNN | 338.28 / 185.03s | 322.42 / 7.42s | 322.42 / 138.12s |
The results are not uniformly “smaller and faster” in every intermediate column. That would be too convenient, and suspiciously well-behaved. The more precise interpretation is better:
FAME’s abstract phase often produces much faster explanations, sometimes smaller than VERIX+’s minimal explanations, but abstract explanations are not always true minimal explanations. When exact refinement is added, the final result can still be faster than VERIX+ alone and often smaller, especially for MNIST-CNN and GTSRB-CNN. For GTSRB-FC, the final explanation is larger than VERIX+’s, but the runtime is far lower.
The GTSRB-CNN case is especially telling. VERIX+ reports 338.28 features in 185.03 seconds. FAME’s iterative greedy abstract phase reports 322.42 features in 7.42 seconds. The paper describes this as up to a 25× speedup for producing an abstract explanation close to minimality. After FAME-accelerated VERIX+ refinement, the final time is 138.12 seconds, still below the VERIX+ baseline.
That is not a miracle. It is a trade-off map.
Greedy selection is boring in the right way
The greedy heuristic deserves attention because it is exactly where many elegant methods quietly become unusable.
In the single-round batch freeing experiment, the paper reports that greedy selection gives a 9× to 12× speedup over the exact MILP solver while achieving abstract explanation sizes very close to the MILP solution, with fewer than nine features difference on average. That result supports the design choice: use MILP to define the optimization problem, but use greedy selection to make the pipeline operational.
With iterative refinement, greedy remains close to MILP while preserving runtime advantage. The appendix reports that the greedy method often reaches zero optimality gap in supplementary comparisons, and that the gap is typically small when it does not.
For business use, this is the right kind of approximation. The guarantee comes from the certificate, not from pretending the greedy search is globally optimal. The greedy step chooses candidate features to test; the certificate decides whether freeing them is sound. That separation is important. It means the heuristic may affect compactness and speed, but it does not casually throw away the formal guarantee. A small mercy from mathematics.
The ResNet appendix is the scalability signal
The appendix experiment on ResNet-2B trained on CIFAR-10 is not just decorative. It addresses the question that inevitably follows any formal-methods paper:
Fine, but does it survive outside small networks?
The authors report that exact formal explanation methods such as the full VERIX+ pipeline with Marabou consistently timed out or exhausted memory on this architecture. FAME, by contrast, terminated for all processed samples and produced sound abstract explanations.
The detailed appendix numbers are useful because they isolate the mechanism:
| Configuration | Reported result | Interpretation |
|---|---|---|
| Single-round abstract freeing | 5.53 pixels freed on average | Initial bounds are too loose on deeper networks |
| Recursive abstract refinement | 476.38 pixels freed on average, about 46% of the image | Domain tightening is essential, not optional |
| Full pipeline with singleton refinement | 240.84 pixels freed on average | Final safety checks reduce the free set but retain a sound abstract explanation |
| Recursive search runtime | About 1934.94 seconds, roughly 32 minutes | Feasible for offline audit, not a live dashboard click |
This is the paper’s most business-relevant boundary. FAME can reach architectures where exact-only methods fail, but the deeper-network case still costs around half an hour per explanation in the reported setup. That is meaningful for certification, audit sampling, incident analysis, and model assurance reviews. It is not yet a casual feature for every prediction request.
The real gain is not “instant explainability.” The gain is that formal explanation moves from impossible to schedulable.
The misconception: formal does not mean exact-minimal from the first step
A common misunderstanding is that a formal explanation must be exact, minimal, and final immediately. Under that view, abstract explanations look like a compromise.
The paper’s better framing is that soundness and minimality are different properties.
FAME’s abstract explanation is sound: if the LiRPA-based certificate succeeds, the relevant property holds for the concrete model under the specified domain. But abstract minimality is relative to the verifier and its relaxation. Because LiRPA over-approximates, some irrelevant features may remain in the explanation simply because the abstraction cannot prove they are irrelevant.
The paper therefore introduces a distance-to-minimality procedure. It measures the worst-case gap between the abstract minimal explanation and a true minimal explanation, using adversarial attacks and optional VERIX+ refinement. This is a useful discipline: rather than pretending the abstraction is perfect, FAME quantifies and refines the gap.
In operational terms:
| Concept | What it gives | What it may miss |
|---|---|---|
| Abstract explanation | Sound guarantee with faster computation | May be larger than necessary |
| Abstract minimal explanation | No more features can be removed under the abstract verifier | May still contain features removable by exact verification |
| Exact refinement | True minimality under stronger verification | Higher computational cost |
| Distance to minimality | A way to measure remaining abstraction gap | Depends on the refinement procedure and solver feasibility |
That is a more mature view of explainability. The pipeline admits that “provably sufficient” and “smallest possible” are not the same invoice.
What this means for business AI assurance
FAME is most relevant where explanations are part of a broader assurance workflow. Think model validation, safety cases, compliance audits, failure analysis, robustness certification, and high-risk AI governance.
The direct paper result is technical: FAME computes sound abstract abductive explanations for neural-network classifiers using LiRPA-based bounds, recursive perturbation-domain refinement, and optional exact refinement.
The business inference is more practical: this kind of method could become a middle layer between informal XAI and full exact verification.
| Business need | How FAME helps | Boundary |
|---|---|---|
| Audit a model decision | Identifies a subset of features sufficient to preserve the decision under perturbation | Demonstrated mainly on image classifiers |
| Reduce exact verification workload | Prunes irrelevant features before expensive solver calls | Still depends on abstraction tightness |
| Support safety cases | Provides proof-backed explanations rather than heuristic saliency | Property and perturbation domain must be well-defined |
| Investigate robustness | Connects explanation to certified local robustness | Does not explain broad semantic reasoning |
| Scale formal XAI experiments | Uses GPU-friendly LiRPA-style bounds | Deep networks remain costly |
This is where firms should avoid overbuying the conclusion. FAME is not an LLM interpretability method. It does not explain a chatbot’s legal reasoning, financial advice, or multi-step planning behavior. The paper’s evidence is centered on neural-network classifiers and robustness-style properties under defined perturbation domains.
That does not reduce its importance. It clarifies its market.
Formal XAI will first matter where the system boundary is crisp enough for verification: perception modules, structured classifiers, safety-critical components, and decision systems with well-defined perturbation rules. For broad generative models, the same philosophy may inspire tools, but the paper does not prove that transfer.
The boundary conditions are part of the method
FAME’s practical value depends on several conditions.
First, the perturbation domain must be meaningful. If the domain is too broad, LiRPA bounds become loose and the abstraction may certify little. If the domain is too narrow, the explanation may be formally correct but operationally uninteresting. The paper notes this relation directly: larger perturbation radii loosen bounds and affect explanation size.
Second, the method depends on the verification property. The paper focuses on classification robustness, where the model’s predicted class remains unchanged under perturbations. Other properties may be possible, but they must be specified formally.
Third, the abstraction matters. CROWN-based LiRPA is sound but conservative. Tighter abstract interpretation methods may improve results, especially for deeper networks. The appendix explicitly points to abstraction looseness as a driver of the gap between abstract and true minimal explanations.
Fourth, compute still matters. The ResNet-2B case uses an NVIDIA A100 80GB GPU and still reports around 32 minutes for the recursive search. That is impressive compared with exact timeout or memory exhaustion, but no one should confuse it with a cheap spreadsheet macro.
A blunt deployment reading:
| Use case | Fit today |
|---|---|
| Offline model audit | Strong fit |
| Certification support | Promising, if the property/domain match the safety case |
| Batch explanation for selected high-risk decisions | Plausible |
| Real-time explanation for every request | Not supported by the reported evidence |
| General LLM reasoning explanation | Outside the paper’s demonstrated scope |
This is not a flaw. It is what an honest method looks like before the sales deck discovers adjectives.
The larger pattern: explainability is becoming verification engineering
The most interesting part of FAME is not that it produces smaller explanations on some benchmarks. It is that the architecture resembles serious engineering: approximate safely, prune aggressively, refine selectively, and measure the remaining gap.
That pattern is likely to matter beyond this specific paper. Many AI assurance problems are too large for exact verification everywhere and too important for heuristic inspection alone. The practical answer will often be layered:
- Use sound approximations to reduce the problem.
- Use attacks or search to identify hard cases.
- Use exact methods only where they are worth the cost.
- Keep track of what is certified, what is approximate, and what remains unknown.
FAME applies that pattern to formal abductive explanations. The result is a method that is neither a saliency toy nor an exact solver fantasy. It is a workflow for making proof-backed explanations less computationally hopeless.
That is why the paper deserves attention from AI governance and assurance teams. Not because it solves explainability in general. It does not. But because it shows how a formal explanation pipeline can be designed around the actual economics of verification.
The future of trustworthy AI will not be built from prettier heatmaps alone. It will need systems that can say, with mathematical discipline, which parts of an input mattered for a decision and under what perturbation assumptions that statement remains true.
FAME is one step toward that. Not fortune, exactly. But for formal XAI, a little fame is already progress.
Cognaptus: Automate the Present, Incubate the Future.
-
Ryma Boumazouza, Raya Elsaleh, Melanie Ducoffe, Shahaf Bassan, and Guy Katz, “FAME: Formal Abstract Minimal Explanation for Neural Networks,” arXiv:2603.10661, 2026, https://arxiv.org/abs/2603.10661. ↩︎