The spreadsheet rule that never quite reaches the model
Rules are everywhere in business software.
An invoice total must match its line items. A loan file must contain the right documents before underwriting. A production schedule cannot assign the same machine to two jobs at the same time. A compliance workflow may tolerate uncertainty in OCR, but not uncertainty about whether a prohibited combination of fields has appeared.
Modern neural systems are good at the first half of this problem: reading messy input, classifying ambiguous signals, and producing probability distributions. Symbolic systems are good at the second half: enforcing rules. The awkward part has always been the joint between them. Usually the neural model predicts, then a solver checks or repairs. The solver may be brilliant. It is also, from the model’s perspective, a locked door. Constraint violations do not flow back as ordinary training signal.
The paper behind AS2, short for Attention-Based Soft Answer Sets, takes aim at that door.1 Its claim is not simply that a neural network can solve Sudoku. That would be a poor use of everyone’s remaining patience. The more interesting claim is architectural: a system can keep perception, reasoning, and constraint feedback inside one differentiable path, while still letting declarative logic define the shape of the reasoning problem.
That distinction matters. AS2 is not “rules sprinkled on top of a neural net.” It is closer to a neural system trained inside a soft version of the rule structure.
AS2 replaces the solver call with a fixed-point target
The normal neuro-symbolic pipeline has a familiar rhythm:
| Stage | What happens | Where the problem appears |
|---|---|---|
| Perception | A neural model turns images or inputs into symbol probabilities | It may be confidently wrong |
| Symbolic reasoning | A solver enforces constraints over discrete symbols | It sits behind a non-differentiable boundary |
| Correction or failure | The solver repairs, rejects, or completes the assignment | The perception model does not directly learn from the logical failure |
AS2 changes the rhythm. Instead of committing early to discrete symbols and then calling a solver, it keeps a probability distribution over possible symbols at each position. Constraint satisfaction is expressed as a differentiable fixed-point residual.
The classical logic-programming idea in the background is the immediate-consequence operator, usually written as $T_P$. In ordinary symbolic reasoning, this operator maps an interpretation to the consequences implied by a program. AS2 uses a probabilistic lift of that operator. The model is trained so that its current soft assignment is close to what the lifted operator says it should be.
In simplified form, the important training pressure is:
Here, $p_{i,s}$ is the model’s probability that position $i$ takes symbol $s$, and $G$ is a constraint group such as a Sudoku row, column, or box. The model is penalized when its distribution is not a fixed point of the constraint operator.
This is the key move. The paper is not merely adding a hand-written penalty that says “please avoid duplicates.” Naive penalties can have useless soft minima, including uniform distributions that do not correspond to any valid assignment. The paper argues that the fixed-point residual avoids that pathology: valid one-hot assignments satisfying the constraints have zero residual, while the uniform distribution does not become a fake solution.
That is the first real contribution. The solver call is replaced by a differentiable target whose topology is derived from the ASP-style constraint program.
The symbolic part survives because the topology is not learned freely
A likely misunderstanding is that once the solver disappears, the symbolic part disappears with it.
That is not quite right. In AS2, the reasoning computation is still governed by the declared constraints. What changes is the computational register: discrete solver inference becomes continuous arithmetic over probability distributions.
This is a useful distinction:
| Design | What is learned from data? | What is fixed by logic? |
|---|---|---|
| Generic neural reasoning | The transformation itself is mostly learned | Very little, unless added indirectly |
| Pipeline neuro-symbolic system | Perception is learned; solving is external | The solver’s constraint model |
| AS2-style soft symbolic reasoning | Perception and soft representations are learned | The constraint topology used by the fixed-point residual |
So AS2 is not symbolic because it uses a solver. It does not. It is symbolic because the reasoning pathway is structurally constrained by the declarative program.
This is a better criterion than the usual “does it contain gradients?” debate, which has always been a little theatrical. Gradients are not the enemy of symbolic structure. Unconstrained learned transformations are. In AS2, the model learns under a fixed logical scaffold rather than learning the scaffold from scratch.
That is also why the paper’s mechanism-first interpretation is more important than its benchmark table. The performance numbers are impressive, but the mechanism explains what kind of system is being proposed.
Constraint-group embeddings replace arbitrary position IDs
The second contribution is less flashy but practically important: AS2 removes conventional positional embeddings.
A standard Transformer needs some way to distinguish tokens. For a sequence, this usually means positional encodings: token 1, token 2, token 3, and so on. But Sudoku cells are not naturally a sentence. Their important identity is not “the 17th token.” Their important identity is “this cell belongs to row 2, column 8, and box 3.”
AS2 therefore uses constraint-group membership embeddings. For Visual Sudoku, each cell receives embeddings for its row, column, and box. The token representation is enriched by the constraint groups in which the cell participates.
This does two things at once.
First, it gives the Transformer a structurally meaningful coordinate system. Cells that share a row, column, or box are related because the rules say they are related, not because some arbitrary index happens to sit nearby.
Second, it aligns the neural representation with the declared logic. The same structure that defines constraint violations also shapes attention.
That is a neat design choice. It does not prove that positional embeddings are useless in every reasoning problem. It does suggest that when the domain already contains a clear constraint graph, forcing the model to rediscover that graph from arbitrary token positions is a strange tax to pay.
The architecture is familiar only at the surface
At a shallow diagram level, AS2 does not look alien. It uses a perception module, then a reasoning module, then output logits.
For Visual Sudoku, the perception module is a shared CNN over cell images. A concept bottleneck head maps each cell into pre-reasoning digit logits. Evidence cells are clamped: during training, known clues are anchored with ground-truth labels; during inference, clue cells use the perception module’s prediction. Then the soft distributions are projected into the reasoning module, enriched by constraint-group membership embeddings, and processed by a Transformer encoder.
The important point is not “CNN plus Transformer.” That is a recipe, not a contribution. The contribution is the coupling:
| Component | Ordinary reading | AS2-specific role |
|---|---|---|
| CNN perception module | Recognizes digit images | Produces soft symbol distributions that can receive constraint feedback |
| Concept bottleneck | Makes intermediate logits interpretable | Separates perceptual evidence from reasoning output |
| Constraint-group embeddings | Replace position IDs | Inject row, column, and box structure into token representations |
| Transformer reasoning module | Propagates information globally | Lets each position update in light of constraint-linked peers |
| Fixed-point loss | Extra training objective | Turns rule satisfaction into differentiable pressure |
| Greedy constrained decoding | Final decoding heuristic | Converts near-valid distributions into valid assignments without an external solver |
This is why AS2 should not be summarized as “a Transformer that solves Sudoku.” That description is technically true and editorially lazy, a dangerous combination.
The Visual Sudoku result is the main evidence
The paper evaluates AS2 on MNIST Addition and Visual Sudoku. The authors are clear that Visual Sudoku is the primary benchmark because it requires reasoning across 27 interacting constraint groups: 9 rows, 9 columns, and 9 boxes.
The reported Visual Sudoku numbers are strong:
| Method | Cell accuracy | Board accuracy | Constraint satisfaction | External solver at inference? |
|---|---|---|---|---|
| Perception-only | 60.51% | 0.00% | 100.00% with solver completion | Yes, for completion |
| Pipeline | 60.51% | 0.00% | 100.00% with solver completion | Yes, for completion |
| PBCS | Not reported | 99.4% | Not reported | Yes, CP-SAT |
| AS2 | 99.89% | 100.0% | 100.0%, Clingo-verified | No |
The nuance is important. AS2’s raw argmax board accuracy, before post-processing, is reported as 95.60%. Soft iterative refinement raises the constraint satisfaction rate to 98.7%. Greedy constrained decoding then closes the remaining gap, yielding 100% board accuracy and 100% constraint satisfaction across 1,000 test boards, verified independently with Clingo.
That sequence matters more than the final “100%” headline.
If AS2 needed a full external solver to reach 100%, it would be another pipeline system with better branding. But the paper’s final decoding procedure is not a CP-SAT or ASP solver call. It greedily commits the most confident assignments, masks impossible digits from related cells, and renormalizes. In other words, the model’s soft distributions already contain most of the reasoning work; the decoder performs a lightweight consistency cleanup.
A fair interpretation is therefore: AS2 learns a representation that is already close to a constraint-satisfying fixed point, and a simple constrained decoding step can make the final discrete output valid.
That is more precise than saying the neural network “learns logic.” It learns under a soft logical operator, then decodes under the same constraint structure. Less mystical. More useful.
MNIST Addition is a sanity check, not the headline
MNIST Addition is included for comparability with older neuro-symbolic systems. It asks the model to recognize handwritten digits and predict their sum. The paper evaluates 2, 4, and 8 addends.
| Addends | AS2 digit accuracy | AS2 sum accuracy | Interpretation |
|---|---|---|---|
| 2 | 99.73% | 99.45% | Competitive with prior systems |
| 4 | 99.87% | 99.44% | Still highly accurate |
| 8 | 99.95% | 99.01% | Digit accuracy remains high, sum accuracy trails simpler baselines slightly |
The paper itself notes that MNIST Addition is saturated. At larger scale, perception-only and pipeline variants slightly outperform AS2 on sum accuracy. The authors attribute this to optimization pressure from the arithmetic constraint loss, which competes with direct digit and sum supervision.
That should not be hidden, because it prevents overclaiming. AS2 is not “strictly better everywhere.” On a benchmark where direct perception plus supervision already performs extremely well, the extra constraint machinery may introduce friction. On a benchmark where global constraint structure is genuinely central, the machinery earns its keep.
This is the kind of distinction business readers should care about. If your task is already solved by a classifier and a small deterministic check, do not build a neuro-soft-symbolic cathedral. Architecture is not a loyalty program; you do not get points for complexity.
What each experiment actually supports
The paper’s evidence is easier to interpret if we separate main evidence from supporting checks.
| Evidence item | Likely purpose | What it supports | What it does not prove |
|---|---|---|---|
| Visual Sudoku results | Main evidence | AS2 can solve a perception-plus-constraint task with high accuracy and no external solver at inference | That AS2 scales to arbitrary enterprise constraints |
| Clingo verification of VCSR | Verification check | The reported valid boards satisfy formal Sudoku constraints | That AS2 itself is an ASP solver |
| Raw argmax vs iterative refinement vs greedy decoding | Mechanism analysis | The model learns near-valid distributions; post-training propagation and decoding close the gap | That every future task will need only greedy decoding |
| MNIST Addition results | Comparability and scaling check | AS2 handles arithmetic constraints and stays competitive on standard neuro-symbolic benchmarks | That constraint loss always improves over perception-only baselines |
| Perception-only and pipeline ablations | Architectural contrast | Perception alone cannot solve Visual Sudoku; solver-based completion can be valid but wrong | That all pipeline systems are obsolete |
This framing is less exciting than “new SOTA system crushes Sudoku.” It is also more useful.
The main evidence is the Visual Sudoku result. MNIST Addition supports breadth, but not decisively. The ablations clarify where reasoning matters. The Clingo verification confirms validity, but it is not used as the inference engine.
The business implication is not solver elimination; it is constraint feedback during learning
The tempting business headline is: “No more solvers.”
That is too blunt. Solvers are not going away. For many enterprise optimization tasks, solvers remain the correct tool: transparent, mature, auditable, and painfully good at what they do.
The more relevant implication is narrower and more interesting: AS2 shows a path for making constraint violations part of the learning signal in perception-heavy workflows.
Consider four classes of business problems:
| Workflow type | Why AS2-like thinking is relevant | Practical boundary |
|---|---|---|
| Document automation | OCR or extraction errors often violate totals, identities, dates, or required-field rules | Real documents contain messy schemas, exceptions, and changing formats |
| Compliance screening | Some outputs are not merely low-confidence but invalid under policy constraints | Policies may be ambiguous, jurisdiction-specific, and not always cleanly expressible as ASP-style constraints |
| Scheduling and allocation | Feasible assignments matter more than locally plausible predictions | Large-scale optimization may still require classical solvers |
| Multi-step operational agents | Actions must satisfy state, permission, and dependency rules | Open-ended language actions are less constrained than Sudoku cells |
The ROI logic is not “replace every rule engine with a neural net.” Please do not do that. Somewhere, a compliance officer just felt a disturbance in the force.
The ROI logic is: when perception and rules interact tightly, the model should learn from rule violations rather than merely being corrected after the fact. That can reduce invalid outputs, improve sample efficiency, and simplify downstream repair in domains where the constraint structure is stable enough to encode.
The phrase “stable enough” is doing real work here. AS2 assumes a well-specified symbolic structure. Sudoku has perfect rules. Business processes often have rules, exceptions, legacy overrides, regional variants, and the occasional spreadsheet maintained by a person named Brian who left in 2019. Turning those into clean declarative constraints is not free.
The implementation lesson: encode structure before asking attention to discover it
For applied AI teams, the constraint-group embedding idea may be the most immediately reusable design pattern.
Many business datasets already contain relational structure: invoice-to-line-item, customer-to-account, shipment-to-route, claim-to-policy, employee-to-shift, product-to-bill-of-materials. Standard models often flatten these structures and then ask attention to infer relationships from token order, field names, or training examples.
AS2 suggests a different instinct: when the structure is known, inject it directly.
That does not require copying AS2 wholesale. A practical team might use:
- membership embeddings for document sections, approval stages, or policy groups;
- graph-aware attention masks for entities that share constraints;
- differentiable residual losses for arithmetic or consistency checks;
- constrained decoding for final outputs that must be valid under simple rules.
The deeper lesson is architectural humility. If the organization already knows which fields constrain which other fields, asking the model to rediscover that fact from examples is not “end-to-end learning.” It is just making the GPU do paperwork.
Where the result should not be overextended
AS2 is promising, but its evidence lives inside two controlled benchmarks.
Visual Sudoku is hard in the right way: it combines perception with global constraint satisfaction. It is also clean in a way enterprise problems rarely are. The symbol domain is finite and small. The constraints are stable. The board size is fixed. The evidence mask is clear. The output can be verified exactly.
MNIST Addition adds arithmetic constraints, but the benchmark is saturated and does not strongly separate methods. The paper’s own results show that constraint loss can introduce optimization tension when the task is already easy for supervised perception.
The strongest practical boundary is scale and expressiveness. The paper argues from ASP-style constraint specifications and differentiable lifted operators, but it does not demonstrate large ASP programs, dynamic business rules, recursive organizational policies, or open-ended LLM agent workflows. It also does not remove the need to design the constraint specification. The logic still has to come from somewhere.
There is also a decoding boundary. Greedy constrained decoding works beautifully in the reported Sudoku setup. Greedy methods can fail in problems where early local commitments block globally optimal or feasible assignments. For larger scheduling, routing, or resource allocation tasks, classical solvers may still be needed either as inference engines, validators, or fallback repair mechanisms.
So the correct business translation is not:
AS2 proves we can remove solvers from enterprise AI.
It is:
AS2 shows that some solver-shaped feedback can be moved inside differentiable training, especially when perception and symbolic constraints are tightly coupled.
That sentence is less viral. It is also less likely to bankrupt a project.
The strategic value is a cleaner interface between learning and rules
AS2 belongs to a broader architectural shift: rules are moving from external guards to internal training structure.
Old pattern:
- Train a model.
- Run predictions through a rule engine.
- Repair, reject, or explain the failures.
AS2-style pattern:
- Encode the rule structure.
- Train the model under differentiable pressure from that structure.
- Decode into outputs that are already close to valid.
- Verify formally where needed.
This does not eliminate governance. It improves the interface between governance and learning.
For Cognaptus-style automation, that is the interesting part. Many business processes are not pure language generation problems. They are structured decision problems contaminated by perception. An AI system may need to read, classify, infer, and act, but its final output must still satisfy boring constraints. Boring constraints are underrated. They are often the difference between a demo and a deployable system.
AS2 offers one candidate mechanism for bringing those constraints closer to the model’s learning dynamics.
Conclusion: soft reasoning is useful when the rules are hard enough
The paper’s strongest contribution is not a benchmark trophy. It is the demonstration of a particular design principle: symbolic constraints can define the topology of a differentiable reasoning process.
That principle is easy to blur. AS2 does not make logic disappear into a neural network. It turns a declarative constraint structure into a soft fixed-point training signal, aligns attention with constraint-group membership, and uses lightweight constrained decoding to produce valid assignments without an external solver at inference.
The result is impressive on Visual Sudoku: 99.89% cell accuracy and 100% board-level constraint satisfaction across 1,000 test boards after greedy constrained decoding, verified with Clingo. The MNIST Addition results are more modest as evidence, but useful as a comparability check.
For business AI, the message is disciplined rather than magical. If the task has stable, explicit constraints and perception errors that interact with those constraints, AS2 points toward a better architecture than “predict first, repair later.” If the task is open-ended, unstable, or dominated by large optimization search, the solver is probably not packing its desk yet.
Soft logic is not a replacement for hard rules. It is a way to let hard rules teach the soft model before the final answer is already wrong.
Cognaptus: Automate the Present, Incubate the Future.
-
Wael AbdAlmageed, “AS2 - Attention-Based Soft Answer Sets: An End-to-End Differentiable Neuro-Soft-Symbolic Reasoning Architecture,” arXiv:2603.18436v1, 19 March 2026, https://arxiv.org/abs/2603.18436. ↩︎